akashrn5 opened a new pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719 ### Why is this PR needed? List files was done during rebuild of SI, which will be costly in case of S3 and OBS ### What changes were proposed in this PR? avoided list files and delete files handled through segmentFileStore ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No(existing tests will takecare) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
Indhumathi27 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410097884 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala ########## @@ -280,6 +284,40 @@ object SecondaryIndexUtil { } } + /** + * This method deletes the old index files or merge index file after data files merge + */ + def deleteOldIndexOrMergeIndexFiles(carbonLoadModel: CarbonLoadModel, + validSegments: util.List[Segment], + indexCarbonTable: CarbonTable): Unit = { + // delete the index/merge index carbonFile of old data files + val loadModel = SecondaryIndexUtil Review comment: Why LoadModel is formed again?, as we already have carbonLoadModel in methodParams ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410099985 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala ########## @@ -280,6 +284,40 @@ object SecondaryIndexUtil { } } + /** + * This method deletes the old index files or merge index file after data files merge + */ + def deleteOldIndexOrMergeIndexFiles(carbonLoadModel: CarbonLoadModel, + validSegments: util.List[Segment], + indexCarbonTable: CarbonTable): Unit = { + // delete the index/merge index carbonFile of old data files + val loadModel = SecondaryIndexUtil + .getCarbonLoadModel(carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable, + carbonLoadModel.getLoadMetadataDetails, + carbonLoadModel.getFactTimeStamp, + carbonLoadModel.getColumnCompressor) + var indexFiles: util.Map[String, String] = null + validSegments.asScala.foreach { segment => + // for old store scenario + if (segment.getSegmentFileName == null) { + val segmentPath = CarbonTablePath.getSegmentPath(indexCarbonTable.getTablePath, + segment.getSegmentNo) + indexFiles = new SegmentIndexFileStore().getMergeOrIndexFilesFromSegment(segmentPath) + } else { + val segmentFileStore = new SegmentFileStore(indexCarbonTable.getTablePath, + segment.getSegmentFileName) + indexFiles = segmentFileStore.getIndexOrMergeFiles + } + } + val indexFilesToDelete = indexFiles.keySet().asScala.filter { indexFile => Review comment: I think for each segment we have to delete old indexFiles. This delete code has to be moved inside foreach loop? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#issuecomment-615154725 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1051/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410119240 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala ########## @@ -280,6 +284,40 @@ object SecondaryIndexUtil { } } + /** + * This method deletes the old index files or merge index file after data files merge + */ + def deleteOldIndexOrMergeIndexFiles(carbonLoadModel: CarbonLoadModel, + validSegments: util.List[Segment], + indexCarbonTable: CarbonTable): Unit = { + // delete the index/merge index carbonFile of old data files + val loadModel = SecondaryIndexUtil Review comment: you are right, removed ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410119308 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala ########## @@ -280,6 +284,40 @@ object SecondaryIndexUtil { } } + /** + * This method deletes the old index files or merge index file after data files merge + */ + def deleteOldIndexOrMergeIndexFiles(carbonLoadModel: CarbonLoadModel, + validSegments: util.List[Segment], + indexCarbonTable: CarbonTable): Unit = { + // delete the index/merge index carbonFile of old data files + val loadModel = SecondaryIndexUtil + .getCarbonLoadModel(carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable, + carbonLoadModel.getLoadMetadataDetails, + carbonLoadModel.getFactTimeStamp, + carbonLoadModel.getColumnCompressor) + var indexFiles: util.Map[String, String] = null + validSegments.asScala.foreach { segment => + // for old store scenario + if (segment.getSegmentFileName == null) { + val segmentPath = CarbonTablePath.getSegmentPath(indexCarbonTable.getTablePath, + segment.getSegmentNo) + indexFiles = new SegmentIndexFileStore().getMergeOrIndexFilesFromSegment(segmentPath) + } else { + val segmentFileStore = new SegmentFileStore(indexCarbonTable.getTablePath, + segment.getSegmentFileName) + indexFiles = segmentFileStore.getIndexOrMergeFiles + } + } + val indexFilesToDelete = indexFiles.keySet().asScala.filter { indexFile => Review comment: yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#issuecomment-615211151 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1054/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#issuecomment-615216132 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2767/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410548764 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala ########## @@ -280,6 +286,35 @@ object SecondaryIndexUtil { } } + /** + * This method deletes the old index files or merge index file after data files merge + */ + def deleteOldIndexOrMergeIndexFiles(factTimeStamp: Long, Review comment: make it private and move factTimeStamp to next line ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410549441 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala ########## @@ -280,6 +286,35 @@ object SecondaryIndexUtil { } } + /** + * This method deletes the old index files or merge index file after data files merge + */ + def deleteOldIndexOrMergeIndexFiles(factTimeStamp: Long, + validSegments: util.List[Segment], + indexCarbonTable: CarbonTable): Unit = { + // delete the index/merge index carbonFile of old data files + validSegments.asScala.foreach { segment => + var indexFiles: util.Map[String, String] = null + // for old store scenario + if (segment.getSegmentFileName == null) { + val segmentPath = CarbonTablePath.getSegmentPath(indexCarbonTable.getTablePath, + segment.getSegmentNo) + indexFiles = new SegmentIndexFileStore().getMergeOrIndexFilesFromSegment(segmentPath) + } else { + val segmentFileStore = new SegmentFileStore(indexCarbonTable.getTablePath, + segment.getSegmentFileName) + indexFiles = segmentFileStore.getIndexOrMergeFiles + } + val indexFilesToDelete = indexFiles.keySet().asScala.filter { indexFile => + DataFileUtil.getTimeStampFromFileName(indexFile).toLong > + factTimeStamp + } + indexFilesToDelete.foreach { indexFile => Review comment: no need to declare local variable indexFilesToDelete ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410550100 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala ########## @@ -280,6 +286,35 @@ object SecondaryIndexUtil { } } + /** + * This method deletes the old index files or merge index file after data files merge + */ + def deleteOldIndexOrMergeIndexFiles(factTimeStamp: Long, + validSegments: util.List[Segment], + indexCarbonTable: CarbonTable): Unit = { + // delete the index/merge index carbonFile of old data files + validSegments.asScala.foreach { segment => + var indexFiles: util.Map[String, String] = null + // for old store scenario + if (segment.getSegmentFileName == null) { Review comment: move line 299 to 307 as a function in CarbonTable to return the index file list for specified segment list ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410550270 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala ########## @@ -197,6 +200,9 @@ object SecondaryIndexUtil { } if (finalMergeStatus) { if (null != mergeStatus && mergeStatus.length != 0) { + deleteOldIndexOrMergeIndexFiles(carbonLoadModel.getFactTimeStamp, Review comment: move carbonLoadModel.getFactTimeStamp to next line ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410655298 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala ########## @@ -197,6 +200,9 @@ object SecondaryIndexUtil { } if (finalMergeStatus) { if (null != mergeStatus && mergeStatus.length != 0) { + deleteOldIndexOrMergeIndexFiles(carbonLoadModel.getFactTimeStamp, Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410655315 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala ########## @@ -280,6 +286,35 @@ object SecondaryIndexUtil { } } + /** + * This method deletes the old index files or merge index file after data files merge + */ + def deleteOldIndexOrMergeIndexFiles(factTimeStamp: Long, + validSegments: util.List[Segment], + indexCarbonTable: CarbonTable): Unit = { + // delete the index/merge index carbonFile of old data files + validSegments.asScala.foreach { segment => + var indexFiles: util.Map[String, String] = null + // for old store scenario + if (segment.getSegmentFileName == null) { Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410655316 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala ########## @@ -280,6 +286,35 @@ object SecondaryIndexUtil { } } + /** + * This method deletes the old index files or merge index file after data files merge + */ + def deleteOldIndexOrMergeIndexFiles(factTimeStamp: Long, + validSegments: util.List[Segment], + indexCarbonTable: CarbonTable): Unit = { + // delete the index/merge index carbonFile of old data files + validSegments.asScala.foreach { segment => + var indexFiles: util.Map[String, String] = null + // for old store scenario + if (segment.getSegmentFileName == null) { + val segmentPath = CarbonTablePath.getSegmentPath(indexCarbonTable.getTablePath, + segment.getSegmentNo) + indexFiles = new SegmentIndexFileStore().getMergeOrIndexFilesFromSegment(segmentPath) + } else { + val segmentFileStore = new SegmentFileStore(indexCarbonTable.getTablePath, + segment.getSegmentFileName) + indexFiles = segmentFileStore.getIndexOrMergeFiles + } + val indexFilesToDelete = indexFiles.keySet().asScala.filter { indexFile => + DataFileUtil.getTimeStampFromFileName(indexFile).toLong > + factTimeStamp + } + indexFilesToDelete.foreach { indexFile => Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#discussion_r410655318 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala ########## @@ -280,6 +286,35 @@ object SecondaryIndexUtil { } } + /** + * This method deletes the old index files or merge index file after data files merge + */ + def deleteOldIndexOrMergeIndexFiles(factTimeStamp: Long, Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#issuecomment-615734963 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1066/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3719: [CARBONDATA-3754]avoid listing index files during SI rebuild
URL: https://github.com/apache/carbondata/pull/3719#issuecomment-615782837 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2779/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
Free forum by Nabble | Edit this page |