[GitHub] carbondata pull request #2410: [CARBONDATA-2650][Datamap] Fix bugs in negati...

classic Classic list List threaded Threaded
46 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2410: [CARBONDATA-2650][Datamap] Fix bugs in negati...

qiuchenjian-2
GitHub user xuchuanyin opened a pull request:

    https://github.com/apache/carbondata/pull/2410

    [CARBONDATA-2650][Datamap] Fix bugs in negative number of skipped blocklets

    Currently in carbondata, default blocklet datamap will be used to prune
    blocklets. Then other indexdatamap will be used.
    But the other index datamap works for segment scope, which in some
    scenarios, the size of pruned result will be bigger than that of default
    datamap, thus causing negative number of skipped blocklets in explain
    query output.
   
    Here we add intersection after pruning. If the pruned result size is
    zero, we will finish the pruning.
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [x] Any interfaces changed?
     `NO`
     - [x] Any backward compatibility impacted?
      `NO`
     - [ ] Document update required?
     `NO`
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
     `NO, tests will be added in another PR`
            - How it is tested? Please attach test report.
    `Tested in local`
            - Is it a performance related change? Please attach the performance test report.
    `NO`
            - Any additional information to help reviewers in testing this change.
            `NA`
     - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    `NA`


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuchuanyin/carbondata issue_2650_negative_skipped_blocklets

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2410.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2410
   
----
commit a6508af91211c2b005a33bed4db2a7964bb27af6
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-06-26T06:11:17Z

    Fix bugs in negative number of skipped blocklets
   
    Currently in carbondata, default blocklet datamap will be used to prune
    blocklets. Then other indexdatamap will be used.
    But the other index datamap works for segment scope, which in some
    scenarios, the size of pruned result will be bigger than that of default
    datamap, thus causing negative number of skipped blocklets in explain
    query output.
   
    Here we add intersection after pruning. If the pruned result size is
    zero, we will finish the pruning.

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2410
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6546/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2410
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5375/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2410
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5449/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2410
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6560/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2410
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5392/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2410
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5466/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2410
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2410
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6573/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2410
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5402/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2410
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2410
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5449/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2410: [CARBONDATA-2650][Datamap] Fix bugs in negati...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2410#discussion_r199047095
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -433,46 +434,57 @@ protected Expression getFilterPredicates(Configuration configuration) {
         // First prune using default datamap on driver side.
         DataMapExprWrapper dataMapExprWrapper = DataMapChooser
             .getDefaultDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    -    List<ExtendedBlocklet> prunedBlocklets =
    +    List<ExtendedBlocklet> finalPrunedBlocklets =
    --- End diff --
   
    It is better to give a different name, initially it is for main index pruning


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2410: [CARBONDATA-2650][Datamap] Fix bugs in negati...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2410#discussion_r199047134
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -433,46 +434,57 @@ protected Expression getFilterPredicates(Configuration configuration) {
         // First prune using default datamap on driver side.
         DataMapExprWrapper dataMapExprWrapper = DataMapChooser
             .getDefaultDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    -    List<ExtendedBlocklet> prunedBlocklets =
    +    List<ExtendedBlocklet> finalPrunedBlocklets =
             dataMapExprWrapper.prune(segmentIds, partitionsToPrune);
    -
         ExplainCollector.recordDefaultDataMapPruning(
    -        dataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size());
    +        dataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size());
    +    if (finalPrunedBlocklets.size() == 0) {
    +      return finalPrunedBlocklets;
    +    }
     
         DataMapChooser chooser = new DataMapChooser(getOrCreateCarbonTable(job.getConfiguration()));
     
         // Get the available CG datamaps and prune further.
         DataMapExprWrapper cgDataMapExprWrapper = chooser.chooseCGDataMap(resolver);
         if (cgDataMapExprWrapper != null) {
           // Prune segments from already pruned blocklets
    -      pruneSegments(segmentIds, prunedBlocklets);
    +      pruneSegments(segmentIds, finalPrunedBlocklets);
    +      List<ExtendedBlocklet> cgPrunedBlocklets = new ArrayList<>();
           // Again prune with CG datamap.
           if (distributedCG && dataMapJob != null) {
    -        prunedBlocklets = DataMapUtil
    +        cgPrunedBlocklets = DataMapUtil
                 .executeDataMapJob(carbonTable, resolver, segmentIds, cgDataMapExprWrapper, dataMapJob,
    --- End diff --
   
    Can you move the function name to previous line to make it more nice formatted


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2410: [CARBONDATA-2650][Datamap] Fix bugs in negati...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2410#discussion_r199047302
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -433,46 +434,57 @@ protected Expression getFilterPredicates(Configuration configuration) {
         // First prune using default datamap on driver side.
         DataMapExprWrapper dataMapExprWrapper = DataMapChooser
             .getDefaultDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    -    List<ExtendedBlocklet> prunedBlocklets =
    +    List<ExtendedBlocklet> finalPrunedBlocklets =
             dataMapExprWrapper.prune(segmentIds, partitionsToPrune);
    -
         ExplainCollector.recordDefaultDataMapPruning(
    -        dataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size());
    +        dataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size());
    +    if (finalPrunedBlocklets.size() == 0) {
    +      return finalPrunedBlocklets;
    +    }
     
         DataMapChooser chooser = new DataMapChooser(getOrCreateCarbonTable(job.getConfiguration()));
     
         // Get the available CG datamaps and prune further.
         DataMapExprWrapper cgDataMapExprWrapper = chooser.chooseCGDataMap(resolver);
         if (cgDataMapExprWrapper != null) {
           // Prune segments from already pruned blocklets
    -      pruneSegments(segmentIds, prunedBlocklets);
    +      pruneSegments(segmentIds, finalPrunedBlocklets);
    +      List<ExtendedBlocklet> cgPrunedBlocklets = new ArrayList<>();
           // Again prune with CG datamap.
           if (distributedCG && dataMapJob != null) {
    -        prunedBlocklets = DataMapUtil
    +        cgPrunedBlocklets = DataMapUtil
                 .executeDataMapJob(carbonTable, resolver, segmentIds, cgDataMapExprWrapper, dataMapJob,
                     partitionsToPrune);
           } else {
    -        prunedBlocklets = cgDataMapExprWrapper.prune(segmentIds, partitionsToPrune);
    +        cgPrunedBlocklets = cgDataMapExprWrapper.prune(segmentIds, partitionsToPrune);
           }
    -
    +      // since index datamap prune in segment scope,
    +      // the result need to intersect with previous pruned result
    +      finalPrunedBlocklets = (List) CollectionUtils.intersection(
    +          finalPrunedBlocklets, cgPrunedBlocklets);
           ExplainCollector.recordCGDataMapPruning(
    -          cgDataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size());
    +          cgDataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size());
    +    }
    +
    +    if (finalPrunedBlocklets.size() == 0) {
    +      return finalPrunedBlocklets;
         }
         // Now try to prune with FG DataMap.
         if (isFgDataMapPruningEnable(job.getConfiguration()) && dataMapJob != null) {
           DataMapExprWrapper fgDataMapExprWrapper = chooser.chooseFGDataMap(resolver);
           if (fgDataMapExprWrapper != null) {
             // Prune segments from already pruned blocklets
    -        pruneSegments(segmentIds, prunedBlocklets);
    -        prunedBlocklets = DataMapUtil
    +        pruneSegments(segmentIds, finalPrunedBlocklets);
    +        List<ExtendedBlocklet> fgPrunedBlocklets = DataMapUtil
                 .executeDataMapJob(carbonTable, resolver, segmentIds, fgDataMapExprWrapper, dataMapJob,
                     partitionsToPrune);
    -
    +        finalPrunedBlocklets = (List) CollectionUtils.intersection(
    +            finalPrunedBlocklets, fgPrunedBlocklets);
             ExplainCollector.recordFGDataMapPruning(
    -            fgDataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size());
    +            fgDataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size());
           }
         } // TODO: add a else branch to push FGDataMap pruning to reader side
    --- End diff --
   
    This TODO can be removed now


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2410: [CARBONDATA-2650][Datamap] Fix bugs in negati...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2410#discussion_r199310661
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -433,46 +434,57 @@ protected Expression getFilterPredicates(Configuration configuration) {
         // First prune using default datamap on driver side.
         DataMapExprWrapper dataMapExprWrapper = DataMapChooser
             .getDefaultDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    -    List<ExtendedBlocklet> prunedBlocklets =
    +    List<ExtendedBlocklet> finalPrunedBlocklets =
    --- End diff --
   
    fixed. Use the origin variable name 'prunedBlocklets' to represent the final output


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2410: [CARBONDATA-2650][Datamap] Fix bugs in negati...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2410#discussion_r199310663
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -433,46 +434,57 @@ protected Expression getFilterPredicates(Configuration configuration) {
         // First prune using default datamap on driver side.
         DataMapExprWrapper dataMapExprWrapper = DataMapChooser
             .getDefaultDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    -    List<ExtendedBlocklet> prunedBlocklets =
    +    List<ExtendedBlocklet> finalPrunedBlocklets =
             dataMapExprWrapper.prune(segmentIds, partitionsToPrune);
    -
         ExplainCollector.recordDefaultDataMapPruning(
    -        dataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size());
    +        dataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size());
    +    if (finalPrunedBlocklets.size() == 0) {
    +      return finalPrunedBlocklets;
    +    }
     
         DataMapChooser chooser = new DataMapChooser(getOrCreateCarbonTable(job.getConfiguration()));
     
         // Get the available CG datamaps and prune further.
         DataMapExprWrapper cgDataMapExprWrapper = chooser.chooseCGDataMap(resolver);
         if (cgDataMapExprWrapper != null) {
           // Prune segments from already pruned blocklets
    -      pruneSegments(segmentIds, prunedBlocklets);
    +      pruneSegments(segmentIds, finalPrunedBlocklets);
    +      List<ExtendedBlocklet> cgPrunedBlocklets = new ArrayList<>();
           // Again prune with CG datamap.
           if (distributedCG && dataMapJob != null) {
    -        prunedBlocklets = DataMapUtil
    +        cgPrunedBlocklets = DataMapUtil
                 .executeDataMapJob(carbonTable, resolver, segmentIds, cgDataMapExprWrapper, dataMapJob,
    --- End diff --
   
    fixed


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2410: [CARBONDATA-2650][Datamap] Fix bugs in negati...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2410#discussion_r199310665
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -433,46 +434,57 @@ protected Expression getFilterPredicates(Configuration configuration) {
         // First prune using default datamap on driver side.
         DataMapExprWrapper dataMapExprWrapper = DataMapChooser
             .getDefaultDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    -    List<ExtendedBlocklet> prunedBlocklets =
    +    List<ExtendedBlocklet> finalPrunedBlocklets =
             dataMapExprWrapper.prune(segmentIds, partitionsToPrune);
    -
         ExplainCollector.recordDefaultDataMapPruning(
    -        dataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size());
    +        dataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size());
    +    if (finalPrunedBlocklets.size() == 0) {
    +      return finalPrunedBlocklets;
    +    }
     
         DataMapChooser chooser = new DataMapChooser(getOrCreateCarbonTable(job.getConfiguration()));
     
         // Get the available CG datamaps and prune further.
         DataMapExprWrapper cgDataMapExprWrapper = chooser.chooseCGDataMap(resolver);
         if (cgDataMapExprWrapper != null) {
           // Prune segments from already pruned blocklets
    -      pruneSegments(segmentIds, prunedBlocklets);
    +      pruneSegments(segmentIds, finalPrunedBlocklets);
    +      List<ExtendedBlocklet> cgPrunedBlocklets = new ArrayList<>();
           // Again prune with CG datamap.
           if (distributedCG && dataMapJob != null) {
    -        prunedBlocklets = DataMapUtil
    +        cgPrunedBlocklets = DataMapUtil
                 .executeDataMapJob(carbonTable, resolver, segmentIds, cgDataMapExprWrapper, dataMapJob,
                     partitionsToPrune);
           } else {
    -        prunedBlocklets = cgDataMapExprWrapper.prune(segmentIds, partitionsToPrune);
    +        cgPrunedBlocklets = cgDataMapExprWrapper.prune(segmentIds, partitionsToPrune);
           }
    -
    +      // since index datamap prune in segment scope,
    +      // the result need to intersect with previous pruned result
    +      finalPrunedBlocklets = (List) CollectionUtils.intersection(
    +          finalPrunedBlocklets, cgPrunedBlocklets);
           ExplainCollector.recordCGDataMapPruning(
    -          cgDataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size());
    +          cgDataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size());
    +    }
    +
    +    if (finalPrunedBlocklets.size() == 0) {
    +      return finalPrunedBlocklets;
         }
         // Now try to prune with FG DataMap.
         if (isFgDataMapPruningEnable(job.getConfiguration()) && dataMapJob != null) {
           DataMapExprWrapper fgDataMapExprWrapper = chooser.chooseFGDataMap(resolver);
           if (fgDataMapExprWrapper != null) {
             // Prune segments from already pruned blocklets
    -        pruneSegments(segmentIds, prunedBlocklets);
    -        prunedBlocklets = DataMapUtil
    +        pruneSegments(segmentIds, finalPrunedBlocklets);
    +        List<ExtendedBlocklet> fgPrunedBlocklets = DataMapUtil
                 .executeDataMapJob(carbonTable, resolver, segmentIds, fgDataMapExprWrapper, dataMapJob,
                     partitionsToPrune);
    -
    +        finalPrunedBlocklets = (List) CollectionUtils.intersection(
    +            finalPrunedBlocklets, fgPrunedBlocklets);
             ExplainCollector.recordFGDataMapPruning(
    -            fgDataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size());
    +            fgDataMapExprWrapper.getDataMapSchema(), finalPrunedBlocklets.size());
           }
         } // TODO: add a else branch to push FGDataMap pruning to reader side
    --- End diff --
   
    fixed


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2410
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5531/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2410: [CARBONDATA-2650][Datamap] Fix bugs in negative numb...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2410
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6676/



---
123