Github user manishnalla1994 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3047#discussion_r245003004
--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ---
@@ -101,14 +102,23 @@ object CarbonStore {
val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) {
// for streaming segment, we should get the actual size from the index file
// since it is continuously inserting data
- val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName)
+ val segmentDir = CarbonTablePath
+ .getSegmentPath(carbonTable.getTablePath, load.getLoadName)
val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir)
val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath))
(indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize)
} else {
// for batch segment, we can get the data size from table status file directly
- (if (load.getDataSize == null) 0L else load.getDataSize.toLong,
- if (load.getIndexSize == null) 0L else load.getIndexSize.toLong)
+ if (null == load.getDataSize || null == load.getIndexSize) {
+ // If either of datasize or indexsize comes to be null the we calculate the correct
+ // size and assign
+ val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, true)
--- End diff --
As it is a metadata function, we are just computing it once and saving it while passing TRUE in 'calculateDataIndexSize' this function. So the value computed can be used afterwards also.
---