Re: Improve carbondata CDC performance
Posted by
ravipesala on
Mar 11, 2021; 2:22pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Improve-carbondata-CDC-performance-tp106093p106748.html
+1
Instead of doing the cartesian join, we can broadcast the sorted min/max
with file paths and do the binary search inside the map function.
Thank you
On Wed, 24 Feb 2021 at 13:02, akashrn5 <
[hidden email]> wrote:
> Hi Venu,
>
> Thanks for your review.
>
> I have replied the same in the document.
> you are right
>
> 1. its taken care to group by extended blocklets on split path and get the
> min-max on block level
> 2. we need to do group by on the file path to avoid the duplicates from
> dataframe output. I have updated the same in the doc please have a look.
>
> Thanks,
> Akash R
>
>
>
> --
> Sent from:
>
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/>
--
Thanks & Regards,
Ravi