[DISCUSSION]Join optimization with Carbondata's metadata

Posted by akashnilugal@gmail.com on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSSION-Join-optimization-with-Carbondata-s-metadata-tp103186.html

Hi Community,

Carbondata has advantages with respect to the metadata we save which is used in so many ways to improve the performance 
with query, load etc. So  I think we need to leverage the metadata we store to improve the query performance especially 
Join performance.

Let's assume we have a query of joining two tables t1 and t2 without any filter condition just with the join keys.
Then both table would be scanned fully and then joined based on join key.

but if the left table is too big, it takes a lot of time. So what if we take the min-max of the right table and apply as between or range filter
(As we store the min-max of each segment in the segment file, we can use these info to apply filter)
on left table and scan less data which would improves join performance. 
I have attached a doc with some examples, please check and let me know

please give your feedback and any other inputs/suggestions to go ahead.

Thanks,

Regards,
Akash R Nilugal

Query_Join_Pruning.docx (344K) Download Attachment