Apache CarbonData Dev Mailing List archive

Re: Improve carbondata CDC performance

Posted by kunalkapoor on Mar 29, 2021; 11:17am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Improve-carbondata-CDC-performance-tp106093p107272.html

+1, agree with ravi's suggestion

On Thu, Mar 11, 2021 at 7:53 PM Ravindra Pesala <[hidden email]>
wrote:

> +1
> Instead of doing the cartesian join, we can broadcast the sorted min/max
> with file paths and do the binary search inside the map function.
>
> Thank you
>
> On Wed, 24 Feb 2021 at 13:02, akashrn5 <[hidden email]> wrote:
>
> > Hi Venu,
> >
> > Thanks for your review.
> >
> > I have replied the same in the document.
> > you are right
> >
> > 1. its taken care to group by extended blocklets on split path and get
> the
> > min-max on block level
> > 2. we need to do group by on the file path to avoid the duplicates from
> > dataframe output. I have updated the same in the doc please have a look.
> >
> > Thanks,
> > Akash R
> >
> >
> >
> > --
> > Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
>
>
> --
> Thanks & Regards,
> Ravi
>