Apache CarbonData Dev Mailing List archive

Re: [DISCUSSION]Join optimization with Carbondata's metadata

Posted by akashrn5 on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSSION-Join-optimization-with-Carbondata-s-metadata-tp103186p103187.html

please note below points addition to above

1. There is a jira in spark similar what i have raised,

https://issues.apache.org/jira/browse/SPARK-27227
they are also aimed at same, but its still in progress and target for spark
3.1.0.
Here they plan to first execute a query on right table to get the min max,
bloom index like that and
apply to left, still the design in review, can go through once.
We can look more deeper into it once.

2.
https://www.qubole.com/blog/enhance-spark-performance-with-dynamic-filtering/
This is also similar one but its in private version,
So please consider this also.

With the above info and our segmentinfo meta, or may be we do store in cache
once we scan the small table. we can use that info to reduce scan for big
table.
As we still do not have spark 3 integration and still dynamic filtering is
in design phase.

Please give your inputs, we can discuss further.

Thanks

Akash R

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/