GitHub user manishnalla1994 opened a pull request:
https://github.com/apache/carbondata/pull/3047 [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize calculation for old store using Show Segments Problem: Table Created and Loading on older version(1.1) was showing data-size and index-size 0B when refreshed on new version. This was because when the data-size was coming as "null" we were not computing it, directly assigning 0 value to it. Solution: Computed the correct data-size and index-size using CarbonTable. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata Datasize0Issue Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3047.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3047 ---- commit 6bf65d7a0b42e8d9a822fd234a510550bd8d2f17 Author: manishnalla1994 <manish.nalla1994@...> Date: 2019-01-02T12:30:36Z Fixed Wrong Datasize and Indexsize calculation for old store ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3047 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2124/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3047 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2330/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3047 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10378/ --- |
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3047#discussion_r244895354 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala --- @@ -101,14 +102,21 @@ object CarbonStore { val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) { // for streaming segment, we should get the actual size from the index file // since it is continuously inserting data - val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName) + val segmentDir = CarbonTablePath + .getSegmentPath(carbonTable.getTablePath, load.getLoadName) val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir) val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath)) (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize) } else { // for batch segment, we can get the data size from table status file directly - (if (load.getDataSize == null) 0L else load.getDataSize.toLong, - if (load.getIndexSize == null) 0L else load.getIndexSize.toLong) + if (null == load.getDataSize && null == load.getIndexSize) { + val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, false) + (dataIndexSize.get(CarbonCommonConstants.CARBON_TOTAL_DATA_SIZE).toLong, + dataIndexSize.get(CarbonCommonConstants.CARBON_TOTAL_INDEX_SIZE).toLong) + } else { + (load.getDataSize.toLong, --- End diff -- if one of load.getDataSize and load.getIndexSize is null, it will throw exception, i think this scene should be considered --- |
In reply to this post by qiuchenjian-2
Github user manishnalla1994 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3047#discussion_r244911752 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala --- @@ -101,14 +102,21 @@ object CarbonStore { val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) { // for streaming segment, we should get the actual size from the index file // since it is continuously inserting data - val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName) + val segmentDir = CarbonTablePath + .getSegmentPath(carbonTable.getTablePath, load.getLoadName) val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir) val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath)) (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize) } else { // for batch segment, we can get the data size from table status file directly - (if (load.getDataSize == null) 0L else load.getDataSize.toLong, - if (load.getIndexSize == null) 0L else load.getIndexSize.toLong) + if (null == load.getDataSize && null == load.getIndexSize) { + val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, false) + (dataIndexSize.get(CarbonCommonConstants.CARBON_TOTAL_DATA_SIZE).toLong, + dataIndexSize.get(CarbonCommonConstants.CARBON_TOTAL_INDEX_SIZE).toLong) + } else { + (load.getDataSize.toLong, --- End diff -- Yes, fixed it now. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3047 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2135/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3047 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10389/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3047 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2341/ --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3047#discussion_r244922117 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala --- @@ -101,14 +102,23 @@ object CarbonStore { val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) { // for streaming segment, we should get the actual size from the index file // since it is continuously inserting data - val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName) + val segmentDir = CarbonTablePath + .getSegmentPath(carbonTable.getTablePath, load.getLoadName) val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir) val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath)) (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize) } else { // for batch segment, we can get the data size from table status file directly - (if (load.getDataSize == null) 0L else load.getDataSize.toLong, - if (load.getIndexSize == null) 0L else load.getIndexSize.toLong) + if (null == load.getDataSize || null == load.getIndexSize) { + // If either of datasize or indexsize comes to be null the we calculate the correct + // size and assign + val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, false) --- End diff -- Boolean flag in the method call is to update the data and index size in the table status file. Pass the flag as true so that it computes the size and update the table status file. This will avoid calculation for each Show Segment call --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3047#discussion_r244920921 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala --- @@ -46,9 +47,9 @@ object CarbonStore { def showSegments( limit: Option[String], - tablePath: String, + carbonTable: CarbonTable, --- End diff -- Move `carbonTable` as the first argument of method --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3047 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2143/ --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:
https://github.com/apache/carbondata/pull/3047 LGTM...can be merged once build passes --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3047 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10397/ --- |
In reply to this post by qiuchenjian-2
Github user manishnalla1994 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3047#discussion_r244957693 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala --- @@ -46,9 +47,9 @@ object CarbonStore { def showSegments( limit: Option[String], - tablePath: String, + carbonTable: CarbonTable, --- End diff -- Done. --- |
In reply to this post by qiuchenjian-2
Github user manishnalla1994 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3047#discussion_r244957746 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala --- @@ -101,14 +102,23 @@ object CarbonStore { val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) { // for streaming segment, we should get the actual size from the index file // since it is continuously inserting data - val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName) + val segmentDir = CarbonTablePath + .getSegmentPath(carbonTable.getTablePath, load.getLoadName) val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir) val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath)) (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize) } else { // for batch segment, we can get the data size from table status file directly - (if (load.getDataSize == null) 0L else load.getDataSize.toLong, - if (load.getIndexSize == null) 0L else load.getIndexSize.toLong) + if (null == load.getDataSize || null == load.getIndexSize) { + // If either of datasize or indexsize comes to be null the we calculate the correct + // size and assign + val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, false) --- End diff -- Fixed. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3047 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2349/ --- |
In reply to this post by qiuchenjian-2
Github user manishnalla1994 commented on the issue:
https://github.com/apache/carbondata/pull/3047 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3047 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2361/ --- |
In reply to this post by qiuchenjian-2
Github user KanakaKumar commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3047#discussion_r244980360 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala --- @@ -101,14 +102,23 @@ object CarbonStore { val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) { // for streaming segment, we should get the actual size from the index file // since it is continuously inserting data - val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName) + val segmentDir = CarbonTablePath + .getSegmentPath(carbonTable.getTablePath, load.getLoadName) val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir) val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath)) (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize) } else { // for batch segment, we can get the data size from table status file directly - (if (load.getDataSize == null) 0L else load.getDataSize.toLong, - if (load.getIndexSize == null) 0L else load.getIndexSize.toLong) + if (null == load.getDataSize || null == load.getIndexSize) { + // If either of datasize or indexsize comes to be null the we calculate the correct + // size and assign + val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, true) --- End diff -- Show segments is a read only query. I think we should not perform write operation in a query. So, I feel its better to calculate every time and show OR just display as not available. --- |
Free forum by Nabble | Edit this page |