Apache CarbonData Dev Mailing List archive

Re: Improve carbondata CDC performance

Posted by ravipesala on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Improve-carbondata-CDC-performance-tp106093p106748.html

+1
Instead of doing the cartesian join, we can broadcast the sorted min/max
with file paths and do the binary search inside the map function.

Thank you

On Wed, 24 Feb 2021 at 13:02, akashrn5 <[hidden email]> wrote:

> Hi Venu,
>
> Thanks for your review.
>
> I have replied the same in the document.
> you are right
>
> 1. its taken care to group by extended blocklets on split path and get the
> min-max on block level
> 2. we need to do group by on the file path to avoid the duplicates from
> dataframe output. I have updated the same in the doc please have a look.
>
> Thanks,
> Akash R
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>

--
Thanks & Regards,
Ravi