[SUGGESTION]Support compaction no_sort
Posted by akashrn5 on Nov 20, 2018; 6:03am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/SUGGESTION-Support-compaction-no-sort-tp68631.html
Hi all,
Currently when the data load is done with sort_scope as NO_SORT, then when
those segments are compacted, data is still not sorted and it will hit
query performance.
The above problem can be solved by sorting the data during compaction and
this helps in query performance.
During busy hours if customer loads data and by default we do sorting , the
loading will be slow. Instead if user makes sort scope as NO_SORT and loads
data, dataloading will be faster. Then when compaction is triggered all the
data will be sorted and written to compacted segment. This will help in
query but compaction performance will degrade and this should be
compromised.
We can expose a property and by default current flow is taken, and if we
configure property, data will be sorted and compacted segment is written.
performance will be hit for compaction, about the degradation, i will
collect the data and publish. Please give your inputs on this.
Thank you,
Akash