Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2060: [CARBONDATA-2252] Query performance slows dow...

Classic

List

Threaded

13 messages Options

qiuchenjian-2

[GitHub] carbondata pull request #2060: [CARBONDATA-2252] Query performance slows dow...

GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/2060

[CARBONDATA-2252] Query performance slows down as the number of columns increases in like query with OR expression

Problem: In case of OR condition with like query contains and ends with, the filter is getting pushed down to carbon layer because of which the query is slow as compared to spark applying the same filter on the results returned from carbon

Analysis: This is because in case of like query the execution is done by RowLevelFilterExecutorImpl which will compute the data row by row. As the number of columns will increase the computation time will increase thereby increasing the query time.

Fix: If there is any OR condition with like query, it is better to return back all the results to spark and let spark do the computation.

- [ ] Any interfaces changed?
No
- [ ] Any backward compatibility impacted?
No
- [ ] Document update required?
No
- [ ] Testing done
Added test cases
- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
NA

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata like_or_disable_pushdown

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2060.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2060

----
commit c2b1bbc2f01700eab37a65df9ab7bd995973efc6
Author: manishgupta88 <tomanishgupta18@...>
Date: 2018-03-13T11:15:48Z

Problem: In case of OR condition with like query contains and ends with, the filter is getting pushed down to carbon layer because of which the
query is slow as compared to spark applying the same filter on the results returned from carbon

Analysis: This is because in case of like query the execution is done by RowLevelFilterExecutorImpl which will compute the data row by row. As the
number of columns will increase the computation time will increase thereby increassing the query time.

Fix: If there is any OR condition with like query, it is better to return back all the results to spark and let spark do the computation.

----

---

qiuchenjian-2