Re: Improve carbondata CDC performance
Posted by
akashrn5 on
Mar 31, 2021; 7:40am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Improve-carbondata-CDC-performance-tp106093p107295.html
Hi Ravi,
Thanks for your inputs.
Actually, the test with binary search and broadcasting didn't give much
benefit and from code perspective also the we need to sort the data our self
based on min max search logic for the array, and also considering the
scenarios of multiple blocks with same min and max, same min or max, the
code wont be much cleaner or modular. This is learning from my side.
So I have updated the design in a similar way, like first we deduplicate the
records and we will have a interval tree which handled the scenario i
mentioned above and insertion and search cost it fast as learned from test
result. Code is very less and clean to understand. the performance we get
from it very good. So I will be updating the design based on that.
I will try to rise PR in this week for the review.
Thanks.
Regards,
Akash R
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/