Login  Register

Re: [Discussion] About carbon.si.segment.merge feature

Posted by Ajantha Bhat on Nov 10, 2020; 12:01pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-About-carbon-si-segment-merge-feature-tp103161p103191.html

@David:
a) yes, SI can use global by default.
b) Handling SI original load itself to launch task based on SI segment size
(need to figure out how to estimate) is better,
else we have to go with one task per node logic (similar to main table
local sort). But current logic needs to changed to avoid small files
problem.
c) Refresh Index for SI is currently only for merging the small files, we
have to rename this command I think. Naming doesn't make sense.
and ReIndex is for loading the missed SI segments from main table, cannot
use it for merge.

@Akash:
a) Loading time difference between SI global_sort and local_sort is the
same as the Data loading difference of any table global sort and local
sort. we already have it.
b) yes, after implementing new SI load logic (task launch based on segment
size), we can compare current with refresh index time. If not much
difference we can remove refresh index support for SI.

Thanks,
Ajantha

On Mon, Nov 9, 2020 at 1:04 PM akashrn5 <[hidden email]> wrote:

> Hi,
>
> Its better to remove i feel, as lot of code will be avoided and we can do
> it
> right the first time we do it.
>
> but please consider below points.
>
> 1. may be once we can test the time difference of global sort and exiting
> local sort load time, may be per segment basis, so that we can have a
> overall time difference we can get in load, basically if we can note down
> the tradeoff time, that's better for future reference and in user
> perspective also.
>
> 2. Also can you check the refresh index and reload time diff, because we
> need to see if all users fine with dropping and recreating again.
>
> Regards,
> Akash
>
>
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>