Size control of minor compaction

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Size control of minor compaction

Zhangshunyu
This post was updated on .
Hi dev,
Currentlly, minor compaction only consider the num of segments and major
compaction only consider the SUM size of segments, but consider a scenario
that the user want to use minor compaction by the num of segments but he
dont want to merge the segment whose datasize larger the threshold for
example 2GB, as it is no need to merge so much big segment and it is time
costly.
so we need to add a parameter to control the threshold of segment included
in minor compaction, so that the user can specify the segment not included
in minor compaction once the datasize exeed the threshold, of course default
value must be threre.

So, what's your opinion about this?



-----
My English name is Sunday
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
My English name is Sunday
Reply | Threaded
Open this post in threaded view
|

Re: Size control of minor compaction

David CaiQiang
+1

It will task many resources and a long time to compact a large segment, and
may not get a good result.

Auto compaction is disabled, we could give a large default value(maybe
1024GB), it will not impact the behavior by default.

And the table level threshold is needed also.

If the user wants to skip some segments, the user can adjust the value to
implement it.



-----
Best Regards
David Cai
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai
Reply | Threaded
Open this post in threaded view
|

Re: Size control of minot compaction

Ajantha Bhat
In reply to this post by Zhangshunyu
Hi Zhangshunyu,

For this scenario specific cases, the user can use custom compaction by
mentioning the segment id which needs to be considered for compaction.

Also if you just want to do size based, major compaction can be used.

So, why are you thinking to support size based minor compaction? It will
basically lose the meaning of combining files based on number.

If you are using minor compaction for this scenario just because it
supports auto compaction, then may be we can check about supporting
"auto_compaction_type" = "minor/major"
option or the user can write some script to trigger major compaction
automatically.

Thanks,
Ajantha


On Mon, 23 Nov, 2020, 12:11 pm Zhangshunyu, <[hidden email]> wrote:

> Hi dev,
> Currentlly, minor compaction only consider the num of segments and major
> compaction only consider the SUM size of segments, but consider a scenario
> that the user want to use minor compaction by the num of segments but he
> dont want to merge the segment whose datasize larger the threshold for
> example 2GB, as it is no need to merge so much big segment and it is time
> costly.
> so we need to add a parameter to control the threshold of segment included
> in minor compaction, so that the user can specify the segment not included
> in minor compaction once the datasize exeed the threshold, of course
> default
> value must be threre.
>
> So, what's your opinion about this?
>
>
>
> -----
> My English name is Sunday
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: Size control of minot compaction

akashrn5
Hi Sunday,

This looks like a valid scenario because, may be some user application
might be doing the minor compaction by default and some may be enabled auto
compaction. which basically will be minor and if size is more we blindly go
to
compact.

So i think instead of supporting auto compaction major/minor and adding as
new feature,
or making more changes to existing code, we can add little more intelligence
to the code
to identify the segments less than the threshold size to consider in minor
compaction.

Thanks

Regards,
Akash R



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Size control of minot compaction

Zhangshunyu
This post was updated on .
In reply to this post by Ajantha Bhat
hi Ajantha, thanks for this reply.
Because many users will enable auto load merge for monir compaction as the
segment will be geneated per hour based on time.
Sometimes, the user will load some history data manually by load cmd, and
the data size of segment for history data will be very large,but the user
dont want to merge this segment in auto load merge while doing minor compaction
as it is time costly, so he want to set a paramter to limit the size of
segment added into minor compaction.



-----
My English name is Sunday
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
My English name is Sunday
Reply | Threaded
Open this post in threaded view
|

Re: Size control of minot compaction

Zhangshunyu
In reply to this post by akashrn5
Yes, we need to support auto load merge for major compaction or size
threshold limit for minor compaction.
In many cases, the user use the minor compaction only want to merge small
segments by time series (the num of segment is generated intime series),
they dont want to merge big segment which is large enough.



-----
My English name is Sunday
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
My English name is Sunday
Reply | Threaded
Open this post in threaded view
|

Re: Size control of minor compaction

Zhangshunyu
In reply to this post by David CaiQiang
agree



-----
My English name is Sunday
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
My English name is Sunday
Reply | Threaded
Open this post in threaded view
|

Re: Size control of minot compaction

Ajantha Bhat
In reply to this post by Zhangshunyu
Hi Zhangshunyu, Thanks for providing more details on the problem.

If it is just for skipping history segments during auto minor compaction,
Adding a size threshold for minor compaction should be fine.
We can have a table level, dynamically configurable threshold.
If it is not configured, consider all the segments for merging. If
configured, consider the segments within the threshold value.

Thanks,
Ajantha

On Mon, Nov 23, 2020 at 5:26 PM Zhangshunyu <[hidden email]> wrote:

> Yes, we need to support auto load merge for major compaction or size
> threshold limit for minor compaction.
> In many cases, the user use the minor compaction only want to merge small
> segments by time series (the num of segment is generated intime series),
> they dont want to merge big segment which is large enough.
>
>
>
> -----
> My English name is Sunday
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: Size control of minot compaction

Zhangshunyu
OK



-----
My English name is Sunday
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
My English name is Sunday
Reply | Threaded
Open this post in threaded view
|

Re: Size control of minot compaction

kunalkapoor
Hi Zhangshunyu,
We should refactor the code and change the property name from "
carbon.major.compaction.size" to "carbon.compaction.size.threshold"( A
global property is exposed which defines the size after which segment would
not be considered for auto compaction). By doing this we can use the same
threshold for major and minor compaction. Let us avoid adding new property
for a minor compaction size threshold.

Consider 5 segments when carbon.compaction.threshold = 1GB:

Minor compaction would consider the segments based on  "
carbon.compaction.size.threshold  " and "carbon.compaction.level.threshold".
Major would consider all segments with size below "
carbon.compaction.size.threshold".
Custom compaction should not consider any property and do a force
compaction(existing behaviour).

Thanks
Kunal Kapoor

On Tue, Nov 24, 2020 at 7:32 AM Zhangshunyu <[hidden email]> wrote:

> OK
>
>
>
> -----
> My English name is Sunday
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: Size control of minot compaction

kunalkapoor
Hi Zhangshunyu,
We should refactor the code and change the property name from "
carbon.major.compaction.size" to "carbon.compaction.size.threshold"( A
global property is exposed which defines the size after which segment would
not be considered for auto compaction). By doing this we can use the same
threshold for major and minor compaction. Let us avoid adding new property
for a minor compaction size threshold.

Minor compaction would consider the segments based on  "
carbon.compaction.size.threshold  " and "carbon.compaction.level.threshold".
Major would consider all segments with size below "
carbon.compaction.size.threshold".
Custom compaction should not consider any property and do a force
compaction(existing behaviour).

Thanks
Kunal Kapoor

On Tue, Nov 24, 2020 at 10:32 AM Kunal Kapoor <[hidden email]>
wrote:

> Hi Zhangshunyu,
> We should refactor the code and change the property name from "
> carbon.major.compaction.size" to "carbon.compaction.size.threshold"( A
> global property is exposed which defines the size after which segment would
> not be considered for auto compaction). By doing this we can use the same
> threshold for major and minor compaction. Let us avoid adding new property
> for a minor compaction size threshold.
>
> Consider 5 segments when carbon.compaction.threshold = 1GB:
>
> Minor compaction would consider the segments based on  "
> carbon.compaction.size.threshold  " and "carbon.compaction.level.threshold
> ".
> Major would consider all segments with size below "
> carbon.compaction.size.threshold".
> Custom compaction should not consider any property and do a force
> compaction(existing behaviour).
>
> Thanks
> Kunal Kapoor
>
> On Tue, Nov 24, 2020 at 7:32 AM Zhangshunyu <[hidden email]>
> wrote:
>
>> OK
>>
>>
>>
>> -----
>> My English name is Sunday
>> --
>> Sent from:
>> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Size control of minot compaction

Zhangshunyu
This post was updated on .
Hi Kunal, if we change the property name, the old user need to change many
places like code of his application, cluster config file etc to adapt to
this change.  What's your opinion? @David @Ajantha



-----
My English name is Sunday
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
My English name is Sunday
Reply | Threaded
Open this post in threaded view
|

Re: Size control of minot compaction

kunalkapoor
The user has to anyways change the application to add new property.
If we don't change the property name then at least we can use the existing
major compaction size threshold property instead of adding a new one.

On Tue, 24 Nov 2020, 1:43 pm Zhangshunyu, <[hidden email]> wrote:

> Hi Akash, if we change the property name, the old user need to change many
> places like code of his application, cluster config file etc to adapt to
> this change.  What's your opinion? @David @Ajantha
>
>
>
> -----
> My English name is Sunday
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>