[GitHub] carbondata pull request #2767: [CARBONDATA-2974] Fixed multiple expressions ...

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2767: [CARBONDATA-2974] Fixed multiple expressions ...

qiuchenjian-2
GitHub user ravipesala opened a pull request:

    https://github.com/apache/carbondata/pull/2767

    [CARBONDATA-2974] Fixed multiple expressions issue on datamap chooser and bloom datamap

    Problem:
    When datamap created on multiple columns and queried with multiple filter expressions then query is failing.
    Solution:
    It is failing because datamap chooser could not merge multiple expressions to single expression. And also bloom filter cannot support And expressions.
   
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ravipesala/incubator-carbondata datamap-choose-fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2767.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2767
   
----
commit 47776c13481b44bff648733060e09b775e720b33
Author: ravipesala <ravi.pesala@...>
Date:   2018-09-26T11:26:03Z

    Fixed multiple expressions issue on datamap chooser and bloom datamap

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2767: [CARBONDATA-2974] Fixed multiple expressions issue o...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2767
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/518/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2767: [CARBONDATA-2974] Fixed multiple expressions issue o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2767
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8849/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2767: [CARBONDATA-2974] Fixed multiple expressions issue o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2767
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/782/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2767: [CARBONDATA-2974] Fixed multiple expressions issue o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2767
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/599/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2767: [CARBONDATA-2974] Fixed multiple expressions issue o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kevinjmh commented on the issue:

    https://github.com/apache/carbondata/pull/2767
 
    Hi @ravipesala , I have some question about this PR
    1. It is a little confusing to only deal with AND case and ignoring OR case if we want the framework do expression merging in general.
    2. This PR still failed when querying on STRING column with bloom filter like "column1 = 123". Please refer to case 1 in description of PR #2665 for details
    3. One more fail case found if apply this PR:  
      3.1 query with filter alike `(Col1='a'  or Col2 ='b')  and (Col3='c' or Col4='d' )  `
      3.2 ensure test result without bloom should not be empty
      3.3 create bloom filter on multi columns in same datamap
      3.4 test result with bloom would be empty. Because OrExpression is not proceeded when creating bloom query model now.
    Please check it.
   
    One more thing I think we could do to improve datamap pruning efficiency is to reuse pruning result between filter conditions.  Such as we query with two filter conditions on segment with 5 blocklets, we get 2 hit blocklets after applying first filter condition, and then the second filter condition can only apply on those 2 blocklets instead of 5.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2767: [CARBONDATA-2974] Fixed multiple expressions issue o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2767
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/791/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2767: [CARBONDATA-2974] Fixed multiple expressions issue o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2767
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8858/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2767: [CARBONDATA-2974] Fixed multiple expressions issue o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2767
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/605/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2767: [CARBONDATA-2974] Fixed multiple expressions issue o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2767
 
    @kevinjmh
    1. DataMap implementations supposed to deal only with AND condition not OR condition. Because we can apply composite index only when we have AND condition.  DataMap framework will handle the OR and union them.
    2. STRING column with bloom filter like "column1 = 123" is fixed in this PR and added test case as well.
    3. I have verified with OR conditions as well and test also added, it works fine.
    If you find any other issues please give me test case I will fix here.
   
    Yes, you are right. Framework should have way to reuse the pruned result in case of AND condition. Will add this optimization in another PR in next version.
   



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2767: [CARBONDATA-2974] Fixed multiple expressions issue o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2767
 
    But it still forward the expressions to bloom data map even if it cannot support them.
   
    For example, notequal expression will be passed to Bloom too even if it isn't supposed to handle it.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2767: [CARBONDATA-2974] Fixed multiple expressions issue o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2767
 
    Besides, in bloom data map we should do intersection with the pruned results generated by each expression.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2767: [CARBONDATA-2974] Fixed multiple expressions issue o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2767
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8864/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2767: [CARBONDATA-2974] Fixed multiple expressions issue o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2767
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/797/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2767: [CARBONDATA-2974] Fixed multiple expressions issue o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2767
 
    @xuchuanyin Fixed it, now passes only required expressions. Please check


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2767: [CARBONDATA-2974] Fixed multiple expressions issue o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2767
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/609/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2767: [CARBONDATA-2974] Fixed multiple expressions issue o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2767
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/801/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2767: [CARBONDATA-2974] Fixed multiple expressions issue o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2767
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8870/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2767: [CARBONDATA-2974] Fixed multiple expressions ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kevinjmh commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2767#discussion_r221117943
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java ---
    @@ -177,34 +180,35 @@ private ExpressionTuple selectDataMap(Expression expression, List<TableDataMap>
               // If both left and right has datamap then we can either merge both datamaps to single
               // datamap if possible. Otherwise apply AND expression.
               if (left.dataMapExprWrapper != null && right.dataMapExprWrapper != null) {
    -            filterExpressionTypes.add(
    -                left.dataMapExprWrapper.getFilterResolverIntf().getFilterExpression()
    -                    .getFilterExpressionType());
    -            filterExpressionTypes.add(
    -                right.dataMapExprWrapper.getFilterResolverIntf().getFilterExpression()
    -                    .getFilterExpressionType());
    +            filterExpressionTypes.addAll(left.filterExpressionTypes);
    +            filterExpressionTypes.addAll(right.filterExpressionTypes);
                 List<ColumnExpression> columnExpressions = new ArrayList<>();
                 columnExpressions.addAll(left.columnExpressions);
                 columnExpressions.addAll(right.columnExpressions);
                 // Check if we can merge them to single datamap.
                 TableDataMap dataMap =
                     chooseDataMap(allDataMap, columnExpressions, filterExpressionTypes);
    +            TrueConditionalResolverImpl resolver = new TrueConditionalResolverImpl(
    +                new AndExpression(left.expression, right.expression), false,
    +                true);
    --- End diff --
   
    so this resolver is used for re-organizing the required expressions?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2767: [CARBONDATA-2974] Fixed multiple expressions ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2767#discussion_r221119029
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java ---
    @@ -282,7 +280,9 @@ private void extractColumnExpression(Expression expression,
           List<Expression> children = expression.getChildren();
           if (children != null && children.size() > 0) {
             for (Expression exp : children) {
    -          extractColumnExpression(exp, columnExpressions);
    +          if (exp != null && exp.getFilterExpressionType() != ExpressionType.UNKNOWN) {
    --- End diff --
   
    I think it's not proper to deal with 'SparkUnknownExpression' specially in framework. You can get the idea from @kevinjmh 's previous PR: The datamap instance know what it wants, so better to let it decide.


---
12