Login  Register

Re: [Discussion]Query Regarding Task launch mechanism for data load operations

Posted by Venkata Gollamudi on Aug 17, 2020; 1:38pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Query-Regarding-Task-launch-mechanism-for-data-load-operations-tp98711p98795.html

Hi Varun,

Yes, previously most cases were tuned for LOCAL_SORT, where merging will
automatically happen.  But certainly data loading flow can be improved to
do it based on data size, rather than a fixed configuration.
However old behaviour might also be required, if the user has to control
the maximum number of partitions in case data size is too big.  This
configuration has started as data loading cores are not transparent to
spark, mainly in case of LOCAL_SORT.

Same thing is applicable for insert into scenario also, as you said
coalescing will reduce the load performance.

Regards,
Ramana

On Fri, Aug 14, 2020 at 3:25 PM David CaiQiang <[hidden email]> wrote:

> This mechanism will work fine for LOCAL_SORT loading of big data and the
> small cluster with big executor.
>
> If it doesn't match these conditions, better consider a new solution to
> adapter the generic scenario.
>
> I suggest re-factoring NO_SORT, maybe we can check and improve the
> global_sort solution.
>
> The solution should support both NO_SORT and GLOBAL_SORT, and automatically
> determines the number of partitions to avoid small file issue.
>
>
>
>
> -----
> Best Regards
> David Cai
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>