GitHub user xuchuanyin opened a pull request:
https://github.com/apache/carbondata/pull/2410 [CARBONDATA-2650][Datamap] Fix bugs in negative number of skipped blocklets Currently in carbondata, default blocklet datamap will be used to prune blocklets. Then other indexdatamap will be used. But the other index datamap works for segment scope, which in some scenarios, the size of pruned result will be bigger than that of default datamap, thus causing negative number of skipped blocklets in explain query output. Here we add intersection after pruning. If the pruned result size is zero, we will finish the pruning. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? `NO` - [x] Any backward compatibility impacted? `NO` - [ ] Document update required? `NO` - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? `NO, tests will be added in another PR` - How it is tested? Please attach test report. `Tested in local` - Is it a performance related change? Please attach the performance test report. `NO` - Any additional information to help reviewers in testing this change. `NA` - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. `NA` You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata issue_2650_negative_skipped_blocklets Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2410.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2410 ---- commit a6508af91211c2b005a33bed4db2a7964bb27af6 Author: xuchuanyin <xuchuanyin@...> Date: 2018-06-26T06:11:17Z Fix bugs in negative number of skipped blocklets Currently in carbondata, default blocklet datamap will be used to prune blocklets. Then other indexdatamap will be used. But the other index datamap works for segment scope, which in some scenarios, the size of pruned result will be bigger than that of default datamap, thus causing negative number of skipped blocklets in explain query output. Here we add intersection after pruning. If the pruned result size is zero, we will finish the pruning. ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2410 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6546/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2410 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5375/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2410 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5449/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2410 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6560/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2410 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5392/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2410 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5466/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2410 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2410 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6573/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2410 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5402/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2410 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2410 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5449/ --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2410#discussion_r199047095 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java --- @@ -433,46 +434,57 @@ protected Expression getFilterPredicates(Configuration configuration) { // First prune using default datamap on driver side. DataMapExprWrapper dataMapExprWrapper = DataMapChooser .getDefaultDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver); - List<ExtendedBlocklet> prunedBlocklets = + List<ExtendedBlocklet> finalPrunedBlocklets = --- End diff -- It is better to give a different name, initially it is for main index pruning --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2410#discussion_r199047134 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java --- @@ -433,46 +434,57 @@ protected Expression getFilterPredicates(Configuration configuration) { // First prune using default datamap on driver side. DataMapExprWrapper dataMapExprWrapper = DataMapChooser .getDefaultDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver); - List<ExtendedBlocklet> prunedBlocklets = + List<ExtendedBlocklet> finalPrunedBlocklets = dataMapExprWrapper.prune(segmentIds, partitionsToPrune); - ExplainCollector.recordDefaultDataMapPruning( - dataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size()); + dataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size()); + if (finalPrunedBlocklets.size() == 0) { + return finalPrunedBlocklets; + } DataMapChooser chooser = new DataMapChooser(getOrCreateCarbonTable(job.getConfiguration())); // Get the available CG datamaps and prune further. DataMapExprWrapper cgDataMapExprWrapper = chooser.chooseCGDataMap(resolver); if (cgDataMapExprWrapper != null) { // Prune segments from already pruned blocklets - pruneSegments(segmentIds, prunedBlocklets); + pruneSegments(segmentIds, finalPrunedBlocklets); + List<ExtendedBlocklet> cgPrunedBlocklets = new ArrayList<>(); // Again prune with CG datamap. if (distributedCG && dataMapJob != null) { - prunedBlocklets = DataMapUtil + cgPrunedBlocklets = DataMapUtil .executeDataMapJob(carbonTable, resolver, segmentIds, cgDataMapExprWrapper, dataMapJob, --- End diff -- Can you move the function name to previous line to make it more nice formatted --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2410#discussion_r199047302 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java --- @@ -433,46 +434,57 @@ protected Expression getFilterPredicates(Configuration configuration) { // First prune using default datamap on driver side. DataMapExprWrapper dataMapExprWrapper = DataMapChooser .getDefaultDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver); - List<ExtendedBlocklet> prunedBlocklets = + List<ExtendedBlocklet> finalPrunedBlocklets = dataMapExprWrapper.prune(segmentIds, partitionsToPrune); - ExplainCollector.recordDefaultDataMapPruning( - dataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size()); + dataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size()); + if (finalPrunedBlocklets.size() == 0) { + return finalPrunedBlocklets; + } DataMapChooser chooser = new DataMapChooser(getOrCreateCarbonTable(job.getConfiguration())); // Get the available CG datamaps and prune further. DataMapExprWrapper cgDataMapExprWrapper = chooser.chooseCGDataMap(resolver); if (cgDataMapExprWrapper != null) { // Prune segments from already pruned blocklets - pruneSegments(segmentIds, prunedBlocklets); + pruneSegments(segmentIds, finalPrunedBlocklets); + List<ExtendedBlocklet> cgPrunedBlocklets = new ArrayList<>(); // Again prune with CG datamap. if (distributedCG && dataMapJob != null) { - prunedBlocklets = DataMapUtil + cgPrunedBlocklets = DataMapUtil .executeDataMapJob(carbonTable, resolver, segmentIds, cgDataMapExprWrapper, dataMapJob, partitionsToPrune); } else { - prunedBlocklets = cgDataMapExprWrapper.prune(segmentIds, partitionsToPrune); + cgPrunedBlocklets = cgDataMapExprWrapper.prune(segmentIds, partitionsToPrune); } - + // since index datamap prune in segment scope, + // the result need to intersect with previous pruned result + finalPrunedBlocklets = (List) CollectionUtils.intersection( + finalPrunedBlocklets, cgPrunedBlocklets); ExplainCollector.recordCGDataMapPruning( - cgDataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size()); + cgDataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size()); + } + + if (finalPrunedBlocklets.size() == 0) { + return finalPrunedBlocklets; } // Now try to prune with FG DataMap. if (isFgDataMapPruningEnable(job.getConfiguration()) && dataMapJob != null) { DataMapExprWrapper fgDataMapExprWrapper = chooser.chooseFGDataMap(resolver); if (fgDataMapExprWrapper != null) { // Prune segments from already pruned blocklets - pruneSegments(segmentIds, prunedBlocklets); - prunedBlocklets = DataMapUtil + pruneSegments(segmentIds, finalPrunedBlocklets); + List<ExtendedBlocklet> fgPrunedBlocklets = DataMapUtil .executeDataMapJob(carbonTable, resolver, segmentIds, fgDataMapExprWrapper, dataMapJob, partitionsToPrune); - + finalPrunedBlocklets = (List) CollectionUtils.intersection( + finalPrunedBlocklets, fgPrunedBlocklets); ExplainCollector.recordFGDataMapPruning( - fgDataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size()); + fgDataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size()); } } // TODO: add a else branch to push FGDataMap pruning to reader side --- End diff -- This TODO can be removed now --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2410#discussion_r199310661 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java --- @@ -433,46 +434,57 @@ protected Expression getFilterPredicates(Configuration configuration) { // First prune using default datamap on driver side. DataMapExprWrapper dataMapExprWrapper = DataMapChooser .getDefaultDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver); - List<ExtendedBlocklet> prunedBlocklets = + List<ExtendedBlocklet> finalPrunedBlocklets = --- End diff -- fixed. Use the origin variable name 'prunedBlocklets' to represent the final output --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2410#discussion_r199310663 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java --- @@ -433,46 +434,57 @@ protected Expression getFilterPredicates(Configuration configuration) { // First prune using default datamap on driver side. DataMapExprWrapper dataMapExprWrapper = DataMapChooser .getDefaultDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver); - List<ExtendedBlocklet> prunedBlocklets = + List<ExtendedBlocklet> finalPrunedBlocklets = dataMapExprWrapper.prune(segmentIds, partitionsToPrune); - ExplainCollector.recordDefaultDataMapPruning( - dataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size()); + dataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size()); + if (finalPrunedBlocklets.size() == 0) { + return finalPrunedBlocklets; + } DataMapChooser chooser = new DataMapChooser(getOrCreateCarbonTable(job.getConfiguration())); // Get the available CG datamaps and prune further. DataMapExprWrapper cgDataMapExprWrapper = chooser.chooseCGDataMap(resolver); if (cgDataMapExprWrapper != null) { // Prune segments from already pruned blocklets - pruneSegments(segmentIds, prunedBlocklets); + pruneSegments(segmentIds, finalPrunedBlocklets); + List<ExtendedBlocklet> cgPrunedBlocklets = new ArrayList<>(); // Again prune with CG datamap. if (distributedCG && dataMapJob != null) { - prunedBlocklets = DataMapUtil + cgPrunedBlocklets = DataMapUtil .executeDataMapJob(carbonTable, resolver, segmentIds, cgDataMapExprWrapper, dataMapJob, --- End diff -- fixed --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2410#discussion_r199310665 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java --- @@ -433,46 +434,57 @@ protected Expression getFilterPredicates(Configuration configuration) { // First prune using default datamap on driver side. DataMapExprWrapper dataMapExprWrapper = DataMapChooser .getDefaultDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver); - List<ExtendedBlocklet> prunedBlocklets = + List<ExtendedBlocklet> finalPrunedBlocklets = dataMapExprWrapper.prune(segmentIds, partitionsToPrune); - ExplainCollector.recordDefaultDataMapPruning( - dataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size()); + dataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size()); + if (finalPrunedBlocklets.size() == 0) { + return finalPrunedBlocklets; + } DataMapChooser chooser = new DataMapChooser(getOrCreateCarbonTable(job.getConfiguration())); // Get the available CG datamaps and prune further. DataMapExprWrapper cgDataMapExprWrapper = chooser.chooseCGDataMap(resolver); if (cgDataMapExprWrapper != null) { // Prune segments from already pruned blocklets - pruneSegments(segmentIds, prunedBlocklets); + pruneSegments(segmentIds, finalPrunedBlocklets); + List<ExtendedBlocklet> cgPrunedBlocklets = new ArrayList<>(); // Again prune with CG datamap. if (distributedCG && dataMapJob != null) { - prunedBlocklets = DataMapUtil + cgPrunedBlocklets = DataMapUtil .executeDataMapJob(carbonTable, resolver, segmentIds, cgDataMapExprWrapper, dataMapJob, partitionsToPrune); } else { - prunedBlocklets = cgDataMapExprWrapper.prune(segmentIds, partitionsToPrune); + cgPrunedBlocklets = cgDataMapExprWrapper.prune(segmentIds, partitionsToPrune); } - + // since index datamap prune in segment scope, + // the result need to intersect with previous pruned result + finalPrunedBlocklets = (List) CollectionUtils.intersection( + finalPrunedBlocklets, cgPrunedBlocklets); ExplainCollector.recordCGDataMapPruning( - cgDataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size()); + cgDataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size()); + } + + if (finalPrunedBlocklets.size() == 0) { + return finalPrunedBlocklets; } // Now try to prune with FG DataMap. if (isFgDataMapPruningEnable(job.getConfiguration()) && dataMapJob != null) { DataMapExprWrapper fgDataMapExprWrapper = chooser.chooseFGDataMap(resolver); if (fgDataMapExprWrapper != null) { // Prune segments from already pruned blocklets - pruneSegments(segmentIds, prunedBlocklets); - prunedBlocklets = DataMapUtil + pruneSegments(segmentIds, finalPrunedBlocklets); + List<ExtendedBlocklet> fgPrunedBlocklets = DataMapUtil .executeDataMapJob(carbonTable, resolver, segmentIds, fgDataMapExprWrapper, dataMapJob, partitionsToPrune); - + finalPrunedBlocklets = (List) CollectionUtils.intersection( + finalPrunedBlocklets, fgPrunedBlocklets); ExplainCollector.recordFGDataMapPruning( - fgDataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size()); + fgDataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size()); } } // TODO: add a else branch to push FGDataMap pruning to reader side --- End diff -- fixed --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2410 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5531/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2410 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6676/ --- |
Free forum by Nabble | Edit this page |