GitHub user Xaprice opened a pull request:
https://github.com/apache/carbondata/pull/1812 [CARBONDATA-2033]support user specified segments in major compation Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? **no** - [ ] Any backward compatibility impacted? **no** - [x] Document update required? **Yes, data-management-on-carbondata.md has been updated.** - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? **yes** - How it is tested? Please attach test report. **test on cluster with 7 nodes** - Is it a performance related change? Please attach the performance test report. **no** - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Xaprice/carbondata specified_segs_in_major_compact Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1812.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1812 ---- commit 96bddafbc9edf48cbb427a75d267178cc1cef2f8 Author: Jin Zhou <xaprice@...> Date: 2018-01-16T09:02:51Z [CARBONDATA-2033]support user specified segments in major compation ---- --- |
Github user Xaprice commented on the issue:
https://github.com/apache/carbondata/pull/1812 Hi @chenliang613 , can you please take a look? --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1812 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2828/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1812 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1593/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1812 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2923/ --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:
https://github.com/apache/carbondata/pull/1812 retest this please --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:
https://github.com/apache/carbondata/pull/1812 please change the title to : [CARBONDATA-2033] Support user specified segments in major compaction --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1812#discussion_r162245791 --- Diff: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java --- @@ -441,6 +452,30 @@ public int compare(LoadMetadataDetails seg1, LoadMetadataDetails seg2) { }); } + /** + * This method will return the list of loads which are specified by user in SQL. + * + * @param listOfSegmentsLoadedInSameDateInterval + * @param segmentIds + * @return + */ + private static List<LoadMetadataDetails> identitySegmentsToBeMergedBasedOnSpecifiedSegments( + List<LoadMetadataDetails> listOfSegmentsLoadedInSameDateInterval, + Set<String> segmentIds) { + List<LoadMetadataDetails> listOfSegmentsSpecified = + new ArrayList<>(CarbonCommonConstants.DEFAULT_COLLECTION_SIZE); + if (segmentIds != null && segmentIds.size() != 0) { + for (LoadMetadataDetails detail : listOfSegmentsLoadedInSameDateInterval) { + if (isSegmentValid(detail) && segmentIds.contains(detail.getLoadName())) { --- End diff -- If the specified segment is not valid, better throw exception about invalid segments instead of ignoring it --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:
https://github.com/apache/carbondata/pull/1812 Two questions: 1. Why only consider major compaction with specified segments, no need to consider minor compaction? 2. Whether can keep consistent syntax as "query with specified segments", or not ? a. First set segment id : "SET carbon.input.segments.dbname.tablename=1,3" b.Do compaction : ALTER TABLE tablename compact 'MAJOR' --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1812 @Xaprice I think we should have validation for the order of segments to be merged. For suppose we have segments of 1 to 8, and the user gives the compaction on 1, 5, 8 then this should not be valid as it will impact the order of data it is inserted. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1812 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1690/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1812 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2923/ --- |
In reply to this post by qiuchenjian-2
Github user Xaprice commented on the issue:
https://github.com/apache/carbondata/pull/1812 @chenliang613 For question 1: I thought minor compaction are mainly used in auto-merging scenario. But after reconsidering this feature, maybe it's better to support both major and minor compaction. I will add support of minor compaction soon. For question 2: I will follow your advice and modify the syntax to keep consistent syntax as "query with specified segments". --- |
In reply to this post by qiuchenjian-2
Github user Xaprice commented on the issue:
https://github.com/apache/carbondata/pull/1812 @ravipesala Compacting adjacent segments is certainly the best practice in most cases. But is it not flexible enough to take it as a mandatory rule? --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1812 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4022/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1812 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2778/ --- |
In reply to this post by qiuchenjian-2
Github user Xaprice commented on the issue:
https://github.com/apache/carbondata/pull/1812 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1812 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4026/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1812 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2781/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1812 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4030/ --- |
Free forum by Nabble | Edit this page |