Apache CarbonData Dev Mailing List archive

Re: Improve carbondata CDC performance

Posted by akashrn5 on Feb 18, 2021; 10:23am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Improve-carbondata-CDC-performance-tp106093p106316.html

Hi,

i got your point basically you wanted to make this logic to be useful for
normal join also.
But for the same thing i had raised a discussion before, you can check
here.
<http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Join-optimization-with-Carbondata-s-metadata-tt103186.html#a103187>

Since Spark already handling in new version, everyone's opinion was not to
make before spark. SO this will be specific for CDC as here its little
different as we are joining intermediate dataframe with source to get the
files to scan. SO this should be fine.

Only problem with the cartesian product as mentioned in design doc, you can
check and give your inputs on that, i also have one more solution to search
in a distributed way with a interval tree data structure.

Thanks,
Akash

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/