GitHub user ravipesala opened a pull request:
https://github.com/apache/carbondata/pull/2204 [WIP] Added CG prune before FG prune. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata datamap-prune-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2204.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2204 ---- commit 4c0a4efaaaa6f4f23d274e75e333f18e3d5b9226 Author: ravipesala <ravi.pesala@...> Date: 2018-04-21T16:29:50Z Added CG prune before FG prune. ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2204 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4103/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2204 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5283/ --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183230061 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java --- @@ -93,6 +93,48 @@ public DataMapExprWrapper choose(CarbonTable carbonTable, FilterResolverIntf res resolverIntf); } + /** + * Return a chosen datamap based on input filter. See {@link DataMapChooser} + */ + public DataMapExprWrapper chooseFG(CarbonTable carbonTable, FilterResolverIntf resolverIntf) --- End diff -- I think better to rename to `chooseFGDataMap` --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183230067 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java --- @@ -93,6 +93,48 @@ public DataMapExprWrapper choose(CarbonTable carbonTable, FilterResolverIntf res resolverIntf); } + /** + * Return a chosen datamap based on input filter. See {@link DataMapChooser} + */ + public DataMapExprWrapper chooseFG(CarbonTable carbonTable, FilterResolverIntf resolverIntf) + throws IOException { + if (resolverIntf != null) { + Expression expression = resolverIntf.getFilterExpression(); + // First check for FG datamaps if any exist + List<TableDataMap> allDataMapFG = + DataMapStoreManager.getInstance().getAllDataMap(carbonTable, DataMapLevel.FG); + ExpressionTuple tuple = selectDataMap(expression, allDataMapFG, resolverIntf); + if (tuple.dataMapExprWrapper != null) { + return tuple.dataMapExprWrapper; + } + } + // Return the default datamap if no other datamap exists. + return null; + } + + /** + * Return a chosen datamap based on input filter. See {@link DataMapChooser} + */ + public DataMapExprWrapper chooseCG(CarbonTable carbonTable, FilterResolverIntf resolverIntf) --- End diff -- rename to `chooseCGDataMap` --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183230077 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java --- @@ -93,6 +93,48 @@ public DataMapExprWrapper choose(CarbonTable carbonTable, FilterResolverIntf res resolverIntf); } + /** + * Return a chosen datamap based on input filter. See {@link DataMapChooser} --- End diff -- change `datamap` to `FG datamap` --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183230082 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java --- @@ -93,6 +93,48 @@ public DataMapExprWrapper choose(CarbonTable carbonTable, FilterResolverIntf res resolverIntf); } + /** + * Return a chosen datamap based on input filter. See {@link DataMapChooser} + */ + public DataMapExprWrapper chooseFG(CarbonTable carbonTable, FilterResolverIntf resolverIntf) + throws IOException { + if (resolverIntf != null) { + Expression expression = resolverIntf.getFilterExpression(); + // First check for FG datamaps if any exist + List<TableDataMap> allDataMapFG = + DataMapStoreManager.getInstance().getAllDataMap(carbonTable, DataMapLevel.FG); + ExpressionTuple tuple = selectDataMap(expression, allDataMapFG, resolverIntf); + if (tuple.dataMapExprWrapper != null) { + return tuple.dataMapExprWrapper; + } + } + // Return the default datamap if no other datamap exists. + return null; + } + + /** + * Return a chosen datamap based on input filter. See {@link DataMapChooser} --- End diff -- `datamap` to `CG datamap` --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183230216 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java --- @@ -359,23 +359,27 @@ protected Expression getFilterPredicates(Configuration configuration) { .getProperty(CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP, CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP_DEFAULT)); DataMapExprWrapper dataMapExprWrapper = - DataMapChooser.get().choose(getOrCreateCarbonTable(job.getConfiguration()), resolver); + DataMapChooser.get().chooseCG(getOrCreateCarbonTable(job.getConfiguration()), resolver); DataMapJob dataMapJob = getDataMapJob(job.getConfiguration()); List<PartitionSpec> partitionsToPrune = getPartitionsToPrune(job.getConfiguration()); List<ExtendedBlocklet> prunedBlocklets; - DataMapLevel dataMapLevel = dataMapExprWrapper.getDataMapType(); - if (dataMapJob != null && - (distributedCG || - (dataMapLevel == DataMapLevel.FG && isFgDataMapPruningEnable(job.getConfiguration())))) { - DistributableDataMapFormat datamapDstr = - new DistributableDataMapFormat(carbonTable, dataMapExprWrapper, segmentIds, - partitionsToPrune, BlockletDataMapFactory.class.getName()); - prunedBlocklets = dataMapJob.execute(datamapDstr, resolver); - // Apply expression on the blocklets. - prunedBlocklets = dataMapExprWrapper.pruneBlocklets(prunedBlocklets); + if (distributedCG) { --- End diff -- I am a bit confused. For CG, there is always a BlockletDataMap (CG), and user may define additional CG datamap. So there are 1 + N CG datamap where N >=0. In this case, we should prune using BlockletDataMap first, then use user defined CG datamap. Am I right? If I am correct, this if check should change to `if (dataMapJob != null)`? --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183230227 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java --- @@ -359,23 +359,27 @@ protected Expression getFilterPredicates(Configuration configuration) { .getProperty(CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP, CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP_DEFAULT)); DataMapExprWrapper dataMapExprWrapper = - DataMapChooser.get().choose(getOrCreateCarbonTable(job.getConfiguration()), resolver); + DataMapChooser.get().chooseCG(getOrCreateCarbonTable(job.getConfiguration()), resolver); DataMapJob dataMapJob = getDataMapJob(job.getConfiguration()); List<PartitionSpec> partitionsToPrune = getPartitionsToPrune(job.getConfiguration()); List<ExtendedBlocklet> prunedBlocklets; - DataMapLevel dataMapLevel = dataMapExprWrapper.getDataMapType(); - if (dataMapJob != null && - (distributedCG || - (dataMapLevel == DataMapLevel.FG && isFgDataMapPruningEnable(job.getConfiguration())))) { - DistributableDataMapFormat datamapDstr = - new DistributableDataMapFormat(carbonTable, dataMapExprWrapper, segmentIds, - partitionsToPrune, BlockletDataMapFactory.class.getName()); - prunedBlocklets = dataMapJob.execute(datamapDstr, resolver); - // Apply expression on the blocklets. - prunedBlocklets = dataMapExprWrapper.pruneBlocklets(prunedBlocklets); + if (distributedCG) { + prunedBlocklets = + executeDataMapJob(carbonTable, resolver, segmentIds, dataMapExprWrapper, dataMapJob, + partitionsToPrune); } else { prunedBlocklets = dataMapExprWrapper.prune(segmentIds, partitionsToPrune); } + dataMapExprWrapper = + DataMapChooser.get().chooseFG(getOrCreateCarbonTable(job.getConfiguration()), resolver); + if (dataMapExprWrapper != null && + dataMapExprWrapper.getDataMapType() == DataMapLevel.FG && + isFgDataMapPruningEnable(job.getConfiguration())) { --- End diff -- seems `datatMapJob != null` is missing --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183230236 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java --- @@ -417,6 +421,38 @@ protected Expression getFilterPredicates(Configuration configuration) { return resultFilterredBlocks; } + private List<ExtendedBlocklet> executeDataMapJob(CarbonTable carbonTable, + FilterResolverIntf resolver, List<Segment> segmentIds, DataMapExprWrapper dataMapExprWrapper, + DataMapJob dataMapJob, List<PartitionSpec> partitionsToPrune) throws IOException { + DistributableDataMapFormat datamapDstr = + new DistributableDataMapFormat(carbonTable, dataMapExprWrapper, segmentIds, + partitionsToPrune, BlockletDataMapFactory.class.getName()); + List<ExtendedBlocklet> prunedBlocklets = dataMapJob.execute(datamapDstr, resolver); + // Apply expression on the blocklets. + prunedBlocklets = dataMapExprWrapper.pruneBlocklets(prunedBlocklets); + return prunedBlocklets; + } + + private void updateSegments(List<Segment> segments, List<ExtendedBlocklet> prunedBlocklets) { --- End diff -- please provide comment for this function, it is not easy to understand what it does --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183233426 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java --- @@ -359,23 +359,27 @@ protected Expression getFilterPredicates(Configuration configuration) { .getProperty(CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP, CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP_DEFAULT)); DataMapExprWrapper dataMapExprWrapper = - DataMapChooser.get().choose(getOrCreateCarbonTable(job.getConfiguration()), resolver); + DataMapChooser.get().chooseCG(getOrCreateCarbonTable(job.getConfiguration()), resolver); DataMapJob dataMapJob = getDataMapJob(job.getConfiguration()); List<PartitionSpec> partitionsToPrune = getPartitionsToPrune(job.getConfiguration()); List<ExtendedBlocklet> prunedBlocklets; - DataMapLevel dataMapLevel = dataMapExprWrapper.getDataMapType(); - if (dataMapJob != null && - (distributedCG || - (dataMapLevel == DataMapLevel.FG && isFgDataMapPruningEnable(job.getConfiguration())))) { - DistributableDataMapFormat datamapDstr = - new DistributableDataMapFormat(carbonTable, dataMapExprWrapper, segmentIds, - partitionsToPrune, BlockletDataMapFactory.class.getName()); - prunedBlocklets = dataMapJob.execute(datamapDstr, resolver); - // Apply expression on the blocklets. - prunedBlocklets = dataMapExprWrapper.pruneBlocklets(prunedBlocklets); + if (distributedCG) { --- End diff -- Ok, corrected now. First always prune with Default datamap and then further pruned with remaining datamaps. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183233431 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java --- @@ -359,23 +359,27 @@ protected Expression getFilterPredicates(Configuration configuration) { .getProperty(CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP, CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP_DEFAULT)); DataMapExprWrapper dataMapExprWrapper = - DataMapChooser.get().choose(getOrCreateCarbonTable(job.getConfiguration()), resolver); + DataMapChooser.get().chooseCG(getOrCreateCarbonTable(job.getConfiguration()), resolver); DataMapJob dataMapJob = getDataMapJob(job.getConfiguration()); List<PartitionSpec> partitionsToPrune = getPartitionsToPrune(job.getConfiguration()); List<ExtendedBlocklet> prunedBlocklets; - DataMapLevel dataMapLevel = dataMapExprWrapper.getDataMapType(); - if (dataMapJob != null && - (distributedCG || - (dataMapLevel == DataMapLevel.FG && isFgDataMapPruningEnable(job.getConfiguration())))) { - DistributableDataMapFormat datamapDstr = - new DistributableDataMapFormat(carbonTable, dataMapExprWrapper, segmentIds, - partitionsToPrune, BlockletDataMapFactory.class.getName()); - prunedBlocklets = dataMapJob.execute(datamapDstr, resolver); - // Apply expression on the blocklets. - prunedBlocklets = dataMapExprWrapper.pruneBlocklets(prunedBlocklets); + if (distributedCG) { + prunedBlocklets = + executeDataMapJob(carbonTable, resolver, segmentIds, dataMapExprWrapper, dataMapJob, + partitionsToPrune); } else { prunedBlocklets = dataMapExprWrapper.prune(segmentIds, partitionsToPrune); } + dataMapExprWrapper = + DataMapChooser.get().chooseFG(getOrCreateCarbonTable(job.getConfiguration()), resolver); + if (dataMapExprWrapper != null && + dataMapExprWrapper.getDataMapType() == DataMapLevel.FG && + isFgDataMapPruningEnable(job.getConfiguration())) { --- End diff -- ok --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183233432 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java --- @@ -417,6 +421,38 @@ protected Expression getFilterPredicates(Configuration configuration) { return resultFilterredBlocks; } + private List<ExtendedBlocklet> executeDataMapJob(CarbonTable carbonTable, + FilterResolverIntf resolver, List<Segment> segmentIds, DataMapExprWrapper dataMapExprWrapper, + DataMapJob dataMapJob, List<PartitionSpec> partitionsToPrune) throws IOException { + DistributableDataMapFormat datamapDstr = + new DistributableDataMapFormat(carbonTable, dataMapExprWrapper, segmentIds, + partitionsToPrune, BlockletDataMapFactory.class.getName()); + List<ExtendedBlocklet> prunedBlocklets = dataMapJob.execute(datamapDstr, resolver); + // Apply expression on the blocklets. + prunedBlocklets = dataMapExprWrapper.pruneBlocklets(prunedBlocklets); + return prunedBlocklets; + } + + private void updateSegments(List<Segment> segments, List<ExtendedBlocklet> prunedBlocklets) { --- End diff -- ok, added --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183233439 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java --- @@ -93,6 +93,48 @@ public DataMapExprWrapper choose(CarbonTable carbonTable, FilterResolverIntf res resolverIntf); } + /** + * Return a chosen datamap based on input filter. See {@link DataMapChooser} --- End diff -- ok --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183233442 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java --- @@ -93,6 +93,48 @@ public DataMapExprWrapper choose(CarbonTable carbonTable, FilterResolverIntf res resolverIntf); } + /** + * Return a chosen datamap based on input filter. See {@link DataMapChooser} + */ + public DataMapExprWrapper chooseFG(CarbonTable carbonTable, FilterResolverIntf resolverIntf) + throws IOException { + if (resolverIntf != null) { + Expression expression = resolverIntf.getFilterExpression(); + // First check for FG datamaps if any exist + List<TableDataMap> allDataMapFG = + DataMapStoreManager.getInstance().getAllDataMap(carbonTable, DataMapLevel.FG); + ExpressionTuple tuple = selectDataMap(expression, allDataMapFG, resolverIntf); + if (tuple.dataMapExprWrapper != null) { + return tuple.dataMapExprWrapper; + } + } + // Return the default datamap if no other datamap exists. + return null; + } + + /** + * Return a chosen datamap based on input filter. See {@link DataMapChooser} + */ + public DataMapExprWrapper chooseCG(CarbonTable carbonTable, FilterResolverIntf resolverIntf) --- End diff -- ok --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183233446 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java --- @@ -93,6 +93,48 @@ public DataMapExprWrapper choose(CarbonTable carbonTable, FilterResolverIntf res resolverIntf); } + /** + * Return a chosen datamap based on input filter. See {@link DataMapChooser} + */ + public DataMapExprWrapper chooseFG(CarbonTable carbonTable, FilterResolverIntf resolverIntf) --- End diff -- ok --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2204 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4122/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2204 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5302/ --- |
In reply to this post by qiuchenjian-2
|
In reply to this post by qiuchenjian-2
|
Free forum by Nabble | Edit this page |