Login  Register

RE: [SUGGESTION]Support compaction no_sort

Posted by manishgupta88 on Dec 05, 2018; 1:03pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/SUGGESTION-Support-compaction-no-sort-tp68631p69827.html

Hi Xuchuanyin

The scope for this feature is to SORT the data during compaction when the
data is loaded using NO_SORT option during data load operation.
There are few users who want to maximize the data load speed and in turn
fine tune the data further during off peak time (time when system is least
used) by executing Compaction operation.

Sorting will be done during compaction by considering the SORT_COLUMNS
property provided during create table operation.

Please find my response below to your queries.

1. will it be proper to keep the sort_scope in table level? It should be in
segment level in this situation and keep it in table level will confuse the
user

Yes. This is expected as feature is to specifically support sorting of data
during compaction so data load operation is expected to be done with
SORT_SCOPE as NO_SORT. But we cannot have the control over it so if multiple
data load operations are done with different sort_scope then during
compaction we have to take care of sorting only the segment which is not
sorted, remaning segments should go only through merge sort flow.
After compaction operation all the data will be written using local sort.

Regards
Manish Gupta



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/