Zhangshunyu opened a new pull request #3833: URL: https://github.com/apache/carbondata/pull/3833 ### Why is this PR needed? tableupdatestatus file always keep the segments info even the compacted segment is deleted already,this will lead to the file size increase quickly, which is bad for performance. ### What changes were proposed in this PR? Remove the invalid segments ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA1 commented on pull request #3833: URL: https://github.com/apache/carbondata/pull/3833#issuecomment-655913486 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1594/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3833: URL: https://github.com/apache/carbondata/pull/3833#issuecomment-655914263 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3334/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3833: URL: https://github.com/apache/carbondata/pull/3833#discussion_r452647862 ########## File path: core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java ########## @@ -148,7 +148,20 @@ public static boolean updateSegmentStatus(List<SegmentUpdateDetails> updateDetai mergeSegmentUpdate(isCompaction, oldList, newBlockEntry); } - segmentUpdateStatusManager.writeLoadDetailsIntoFile(oldList, updateStatusFileIdentifier); + List<SegmentUpdateDetails> updateDetailsValidSeg = new ArrayList<>(); + for (SegmentUpdateDetails updateDetails : oldList) { + for (LoadMetadataDetails details : segmentUpdateStatusManager.getLoadMetadataDetails()) { + if (updateDetails.getSegmentName().equalsIgnoreCase(details.getLoadName())) { + // we should only keep the update info of segments in table status, especially after + // compaction and clean files some compacted segments will removed. It can keep Review comment: ```suggestion // compaction and clean files some compacted segments will be removed. It can keep ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
marchpure commented on a change in pull request #3833: URL: https://github.com/apache/carbondata/pull/3833#discussion_r452657326 ########## File path: core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java ########## @@ -148,7 +148,20 @@ public static boolean updateSegmentStatus(List<SegmentUpdateDetails> updateDetai mergeSegmentUpdate(isCompaction, oldList, newBlockEntry); } - segmentUpdateStatusManager.writeLoadDetailsIntoFile(oldList, updateStatusFileIdentifier); + List<SegmentUpdateDetails> updateDetailsValidSeg = new ArrayList<>(); Review comment: Map<String, LoadMetadataDetails> details <- segmentUpdateStatusManager.getLoadMetadataDetails() for (SegmentUpdateDetails updateDetail : oldList) { if(details.contains(updateDetail) { } } ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Zhangshunyu commented on a change in pull request #3833: URL: https://github.com/apache/carbondata/pull/3833#discussion_r452657347 ########## File path: core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java ########## @@ -148,7 +148,20 @@ public static boolean updateSegmentStatus(List<SegmentUpdateDetails> updateDetai mergeSegmentUpdate(isCompaction, oldList, newBlockEntry); } - segmentUpdateStatusManager.writeLoadDetailsIntoFile(oldList, updateStatusFileIdentifier); + List<SegmentUpdateDetails> updateDetailsValidSeg = new ArrayList<>(); + for (SegmentUpdateDetails updateDetails : oldList) { + for (LoadMetadataDetails details : segmentUpdateStatusManager.getLoadMetadataDetails()) { + if (updateDetails.getSegmentName().equalsIgnoreCase(details.getLoadName())) { + // we should only keep the update info of segments in table status, especially after + // compaction and clean files some compacted segments will removed. It can keep Review comment: @Indhumathi27 OK ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
marchpure commented on a change in pull request #3833: URL: https://github.com/apache/carbondata/pull/3833#discussion_r452657326 ########## File path: core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java ########## @@ -148,7 +148,20 @@ public static boolean updateSegmentStatus(List<SegmentUpdateDetails> updateDetai mergeSegmentUpdate(isCompaction, oldList, newBlockEntry); } - segmentUpdateStatusManager.writeLoadDetailsIntoFile(oldList, updateStatusFileIdentifier); + List<SegmentUpdateDetails> updateDetailsValidSeg = new ArrayList<>(); Review comment: Map<String, LoadMetadataDetails> details <- segmentUpdateStatusManager.getLoadMetadataDetails() for (SegmentUpdateDetails updateDetail : oldList) { if(details.contains(updateDetails.getSegmentName()) { } } ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Zhangshunyu commented on a change in pull request #3833: URL: https://github.com/apache/carbondata/pull/3833#discussion_r452663050 ########## File path: core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java ########## @@ -148,7 +148,20 @@ public static boolean updateSegmentStatus(List<SegmentUpdateDetails> updateDetai mergeSegmentUpdate(isCompaction, oldList, newBlockEntry); } - segmentUpdateStatusManager.writeLoadDetailsIntoFile(oldList, updateStatusFileIdentifier); + List<SegmentUpdateDetails> updateDetailsValidSeg = new ArrayList<>(); Review comment: @marchpure OK, use a hashset to check ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3833: URL: https://github.com/apache/carbondata/pull/3833#issuecomment-656581036 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1607/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3833: URL: https://github.com/apache/carbondata/pull/3833#issuecomment-656582778 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3347/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
marchpure commented on pull request #3833: URL: https://github.com/apache/carbondata/pull/3833#issuecomment-656653483 LGTM ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
asfgit closed pull request #3833: URL: https://github.com/apache/carbondata/pull/3833 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |