GitHub user manishgupta88 opened a pull request:
https://github.com/apache/carbondata/pull/2060 [CARBONDATA-2252] Query performance slows down as the number of columns increases in like query with OR expression Problem: In case of OR condition with like query contains and ends with, the filter is getting pushed down to carbon layer because of which the query is slow as compared to spark applying the same filter on the results returned from carbon Analysis: This is because in case of like query the execution is done by RowLevelFilterExecutorImpl which will compute the data row by row. As the number of columns will increase the computation time will increase thereby increasing the query time. Fix: If there is any OR condition with like query, it is better to return back all the results to spark and let spark do the computation. - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? No - [ ] Testing done Added test cases - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishgupta88/carbondata like_or_disable_pushdown Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2060.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2060 ---- commit c2b1bbc2f01700eab37a65df9ab7bd995973efc6 Author: manishgupta88 <tomanishgupta18@...> Date: 2018-03-13T11:15:48Z Problem: In case of OR condition with like query contains and ends with, the filter is getting pushed down to carbon layer because of which the query is slow as compared to spark applying the same filter on the results returned from carbon Analysis: This is because in case of like query the execution is done by RowLevelFilterExecutorImpl which will compute the data row by row. As the number of columns will increase the computation time will increase thereby increassing the query time. Fix: If there is any OR condition with like query, it is better to return back all the results to spark and let spark do the computation. ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2060 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4244/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2060 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3000/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2060 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3884/ --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/carbondata/pull/2060 Besides OR condition, can we evaluate for other filter condition also, how much benefit we get for like contains and endwith pushdown? --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2060 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3012/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2060 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4256/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2060 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3110/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2060 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4344/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2060 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5166/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2060 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3956/ --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:
https://github.com/apache/carbondata/pull/2060 Need some more work for proper optimization. Therefore closing as of now --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 closed the pull request at:
https://github.com/apache/carbondata/pull/2060 --- |
Free forum by Nabble | Edit this page |