This post was updated on .
Hi dev,
Currentlly, minor compaction only consider the num of segments and major compaction only consider the SUM size of segments, but consider a scenario that the user want to use minor compaction by the num of segments but he dont want to merge the segment whose datasize larger the threshold for example 2GB, as it is no need to merge so much big segment and it is time costly. so we need to add a parameter to control the threshold of segment included in minor compaction, so that the user can specify the segment not included in minor compaction once the datasize exeed the threshold, of course default value must be threre. So, what's your opinion about this? ----- My English name is Sunday -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
My English name is Sunday
|
+1
It will task many resources and a long time to compact a large segment, and may not get a good result. Auto compaction is disabled, we could give a large default value(maybe 1024GB), it will not impact the behavior by default. And the table level threshold is needed also. If the user wants to skip some segments, the user can adjust the value to implement it. ----- Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai |
In reply to this post by Zhangshunyu
Hi Zhangshunyu,
For this scenario specific cases, the user can use custom compaction by mentioning the segment id which needs to be considered for compaction. Also if you just want to do size based, major compaction can be used. So, why are you thinking to support size based minor compaction? It will basically lose the meaning of combining files based on number. If you are using minor compaction for this scenario just because it supports auto compaction, then may be we can check about supporting "auto_compaction_type" = "minor/major" option or the user can write some script to trigger major compaction automatically. Thanks, Ajantha On Mon, 23 Nov, 2020, 12:11 pm Zhangshunyu, <[hidden email]> wrote: > Hi dev, > Currentlly, minor compaction only consider the num of segments and major > compaction only consider the SUM size of segments, but consider a scenario > that the user want to use minor compaction by the num of segments but he > dont want to merge the segment whose datasize larger the threshold for > example 2GB, as it is no need to merge so much big segment and it is time > costly. > so we need to add a parameter to control the threshold of segment included > in minor compaction, so that the user can specify the segment not included > in minor compaction once the datasize exeed the threshold, of course > default > value must be threre. > > So, what's your opinion about this? > > > > ----- > My English name is Sunday > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Hi Sunday,
This looks like a valid scenario because, may be some user application might be doing the minor compaction by default and some may be enabled auto compaction. which basically will be minor and if size is more we blindly go to compact. So i think instead of supporting auto compaction major/minor and adding as new feature, or making more changes to existing code, we can add little more intelligence to the code to identify the segments less than the threshold size to consider in minor compaction. Thanks Regards, Akash R -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
This post was updated on .
In reply to this post by Ajantha Bhat
hi Ajantha, thanks for this reply.
Because many users will enable auto load merge for monir compaction as the segment will be geneated per hour based on time. Sometimes, the user will load some history data manually by load cmd, and the data size of segment for history data will be very large,but the user dont want to merge this segment in auto load merge while doing minor compaction as it is time costly, so he want to set a paramter to limit the size of segment added into minor compaction. ----- My English name is Sunday -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
My English name is Sunday
|
In reply to this post by akashrn5
Yes, we need to support auto load merge for major compaction or size
threshold limit for minor compaction. In many cases, the user use the minor compaction only want to merge small segments by time series (the num of segment is generated intime series), they dont want to merge big segment which is large enough. ----- My English name is Sunday -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
My English name is Sunday
|
In reply to this post by David CaiQiang
agree
----- My English name is Sunday -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
My English name is Sunday
|
In reply to this post by Zhangshunyu
Hi Zhangshunyu, Thanks for providing more details on the problem.
If it is just for skipping history segments during auto minor compaction, Adding a size threshold for minor compaction should be fine. We can have a table level, dynamically configurable threshold. If it is not configured, consider all the segments for merging. If configured, consider the segments within the threshold value. Thanks, Ajantha On Mon, Nov 23, 2020 at 5:26 PM Zhangshunyu <[hidden email]> wrote: > Yes, we need to support auto load merge for major compaction or size > threshold limit for minor compaction. > In many cases, the user use the minor compaction only want to merge small > segments by time series (the num of segment is generated intime series), > they dont want to merge big segment which is large enough. > > > > ----- > My English name is Sunday > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
OK
----- My English name is Sunday -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
My English name is Sunday
|
Hi Zhangshunyu,
We should refactor the code and change the property name from " carbon.major.compaction.size" to "carbon.compaction.size.threshold"( A global property is exposed which defines the size after which segment would not be considered for auto compaction). By doing this we can use the same threshold for major and minor compaction. Let us avoid adding new property for a minor compaction size threshold. Consider 5 segments when carbon.compaction.threshold = 1GB: Minor compaction would consider the segments based on " carbon.compaction.size.threshold " and "carbon.compaction.level.threshold". Major would consider all segments with size below " carbon.compaction.size.threshold". Custom compaction should not consider any property and do a force compaction(existing behaviour). Thanks Kunal Kapoor On Tue, Nov 24, 2020 at 7:32 AM Zhangshunyu <[hidden email]> wrote: > OK > > > > ----- > My English name is Sunday > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Hi Zhangshunyu,
We should refactor the code and change the property name from " carbon.major.compaction.size" to "carbon.compaction.size.threshold"( A global property is exposed which defines the size after which segment would not be considered for auto compaction). By doing this we can use the same threshold for major and minor compaction. Let us avoid adding new property for a minor compaction size threshold. Minor compaction would consider the segments based on " carbon.compaction.size.threshold " and "carbon.compaction.level.threshold". Major would consider all segments with size below " carbon.compaction.size.threshold". Custom compaction should not consider any property and do a force compaction(existing behaviour). Thanks Kunal Kapoor On Tue, Nov 24, 2020 at 10:32 AM Kunal Kapoor <[hidden email]> wrote: > Hi Zhangshunyu, > We should refactor the code and change the property name from " > carbon.major.compaction.size" to "carbon.compaction.size.threshold"( A > global property is exposed which defines the size after which segment would > not be considered for auto compaction). By doing this we can use the same > threshold for major and minor compaction. Let us avoid adding new property > for a minor compaction size threshold. > > Consider 5 segments when carbon.compaction.threshold = 1GB: > > Minor compaction would consider the segments based on " > carbon.compaction.size.threshold " and "carbon.compaction.level.threshold > ". > Major would consider all segments with size below " > carbon.compaction.size.threshold". > Custom compaction should not consider any property and do a force > compaction(existing behaviour). > > Thanks > Kunal Kapoor > > On Tue, Nov 24, 2020 at 7:32 AM Zhangshunyu <[hidden email]> > wrote: > >> OK >> >> >> >> ----- >> My English name is Sunday >> -- >> Sent from: >> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ >> > |
This post was updated on .
Hi Kunal, if we change the property name, the old user need to change many
places like code of his application, cluster config file etc to adapt to this change. What's your opinion? @David @Ajantha ----- My English name is Sunday -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
My English name is Sunday
|
The user has to anyways change the application to add new property.
If we don't change the property name then at least we can use the existing major compaction size threshold property instead of adding a new one. On Tue, 24 Nov 2020, 1:43 pm Zhangshunyu, <[hidden email]> wrote: > Hi Akash, if we change the property name, the old user need to change many > places like code of his application, cluster config file etc to adapt to > this change. What's your opinion? @David @Ajantha > > > > ----- > My English name is Sunday > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Free forum by Nabble | Edit this page |