[GitHub] carbondata pull request #2060: [CARBONDATA-2252] Query performance slows dow...

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2060: [CARBONDATA-2252] Query performance slows dow...

qiuchenjian-2
GitHub user manishgupta88 opened a pull request:

    https://github.com/apache/carbondata/pull/2060

    [CARBONDATA-2252] Query performance slows down as the number of columns increases in like query with OR expression

    Problem: In case of OR condition with like query contains and ends with, the filter is getting pushed down to carbon layer because of which the query is slow as compared to spark applying the same filter on the results returned from carbon
   
    Analysis: This is because in case of like query the execution is done by RowLevelFilterExecutorImpl which will compute the data row by row. As the number of columns will increase the computation time will increase thereby increasing the query time.
   
    Fix: If there is any OR condition with like query, it is better to return back all the results to spark and let spark do the computation.
   
     - [ ] Any interfaces changed?
     No
     - [ ] Any backward compatibility impacted?
     No
     - [ ] Document update required?
    No
     - [ ] Testing done
    Added test cases      
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    NA


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/manishgupta88/carbondata like_or_disable_pushdown

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2060.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2060
   
----
commit c2b1bbc2f01700eab37a65df9ab7bd995973efc6
Author: manishgupta88 <tomanishgupta18@...>
Date:   2018-03-13T11:15:48Z

    Problem: In case of OR condition with like query contains and ends with, the filter is getting pushed down to carbon layer because of which the
    query is slow as compared to spark applying the same filter on the results returned from carbon
   
    Analysis: This is because in case of like query the execution is done by RowLevelFilterExecutorImpl which will compute the data row by row. As the
    number of columns will increase the computation time will increase thereby increassing the query time.
   
    Fix: If there is any OR condition with like query, it is better to return back all the results to spark and let spark do the computation.

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2060: [CARBONDATA-2252] Query performance slows down as th...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2060
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4244/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2060: [CARBONDATA-2252] Query performance slows down as th...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2060
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3000/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2060: [CARBONDATA-2252] Query performance slows down as th...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2060
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3884/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2060: [CARBONDATA-2252] Query performance slows down as th...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/2060
 
    Besides OR condition, can we evaluate for other filter condition also, how much benefit we get for like contains and endwith pushdown?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2060: [CARBONDATA-2252] Query performance slows down as th...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2060
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3012/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2060: [CARBONDATA-2252] Query performance slows down as th...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2060
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4256/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2060: [CARBONDATA-2252] Query performance slows down as th...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2060
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3110/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2060: [CARBONDATA-2252] Query performance slows down as th...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2060
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4344/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2060: [WIP] [CARBONDATA-2252] Query performance slows down...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2060
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5166/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2060: [WIP] [CARBONDATA-2252] Query performance slows down...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2060
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3956/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2060: [WIP] [CARBONDATA-2252] Query performance slows down...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/2060
 
    Need some more work for proper optimization. Therefore closing as of now


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2060: [WIP] [CARBONDATA-2252] Query performance slo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 closed the pull request at:

    https://github.com/apache/carbondata/pull/2060


---