Posted by akashrn5 on Feb 18, 2021; 10:23am URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Improve-carbondata-CDC-performance-tp106093p106316.html
Since Spark already handling in new version, everyone's opinion was not to
make before spark. SO this will be specific for CDC as here its little
different as we are joining intermediate dataframe with source to get the
files to scan. SO this should be fine.
Only problem with the cartesian product as mentioned in design doc, you can
check and give your inputs on that, i also have one more solution to search
in a distributed way with a interval tree data structure.