Re: Improve carbondata CDC performance
Posted by
kunalkapoor on
Mar 29, 2021; 11:17am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Improve-carbondata-CDC-performance-tp106093p107272.html
+1, agree with ravi's suggestion
On Thu, Mar 11, 2021 at 7:53 PM Ravindra Pesala <
[hidden email]>
wrote:
> +1
> Instead of doing the cartesian join, we can broadcast the sorted min/max
> with file paths and do the binary search inside the map function.
>
> Thank you
>
> On Wed, 24 Feb 2021 at 13:02, akashrn5 <
[hidden email]> wrote:
>
> > Hi Venu,
> >
> > Thanks for your review.
> >
> > I have replied the same in the document.
> > you are right
> >
> > 1. its taken care to group by extended blocklets on split path and get
> the
> > min-max on block level
> > 2. we need to do group by on the file path to avoid the duplicates from
> > dataframe output. I have updated the same in the doc please have a look.
> >
> > Thanks,
> > Akash R
> >
> >
> >
> > --
> > Sent from:
> >
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/> >
>
>
> --
> Thanks & Regards,
> Ravi
>