Hi all,
Currently when the data load is done with sort_scope as NO_SORT, then when those segments are compacted, data is still not sorted and it will hit query performance. The above problem can be solved by sorting the data during compaction and this helps in query performance. During busy hours if customer loads data and by default we do sorting , the loading will be slow. Instead if user makes sort scope as NO_SORT and loads data, dataloading will be faster. Then when compaction is triggered all the data will be sorted and written to compacted segment. This will help in query but compaction performance will degrade and this should be compromised. We can expose a property and by default current flow is taken, and if we configure property, data will be sorted and compacted segment is written. performance will be hit for compaction, about the degradation, i will collect the data and publish. Please give your inputs on this. Thank you, Akash |
What’s your proposal?
Do you want the data to be no_sort at loading and to be sorted at compaction? |
In reply to this post by akashrn5
+1,
It should be faster after compaction with sort, please test and compare the compaction performance between sort and no sort. please support dynamic configure in CarbonPropertis for compaction with sort and no sort, especially their performance has differ greatly. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by xuchuanyin
hi xuchuanyin,
it is basically if user wants data to be loaded fast, then he will use no sort right. so during compaction if we sort the data and load to new compacted segment then the complete data will be sorted. so it helps in query performance. I hope i answered your question -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
What’s your proposal for the corresponding grammar to do that?
Besides, if we only sort after compaction, will it be proper to keep the sort_scope in table level? It should be in segment level in this situation and keep it in table level will confuse the user. How do you consider this? Sent from laptop |
currently , what i have thought is, if all the loads involved for compaction
are no sort then only we will sort during compaction. So currently we have table level, that is fine. So if the table has no_sort during compaction it will be sorted , if local sort it will go to current compaction flow. I think there can be no confusion. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by xuchuanyin
Hi Xuchuanyin
The scope for this feature is to SORT the data during compaction when the data is loaded using NO_SORT option during data load operation. There are few users who want to maximize the data load speed and in turn fine tune the data further during off peak time (time when system is least used) by executing Compaction operation. Sorting will be done during compaction by considering the SORT_COLUMNS property provided during create table operation. Please find my response below to your queries. 1. will it be proper to keep the sort_scope in table level? It should be in segment level in this situation and keep it in table level will confuse the user Yes. This is expected as feature is to specifically support sorting of data during compaction so data load operation is expected to be done with SORT_SCOPE as NO_SORT. But we cannot have the control over it so if multiple data load operations are done with different sort_scope then during compaction we have to take care of sorting only the segment which is not sorted, remaning segments should go only through merge sort flow. After compaction operation all the data will be written using local sort. Regards Manish Gupta -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
So what’s your proposal for the grammar of this feature?
Do you want carbon to do it silently without any configurations or choices from user? What I am concerned about is that the performance of compaction. If user use auto-compaction, the loading will be more delayed if we do compaction using localsort. Moreover, if user can bear the time to compaction, will he want it to be global-sort or others? The 2 points above are the reason that I want to know about the grammar for this feature. === Sent from laptop |
Free forum by Nabble | Edit this page |