[Discussion] About syntax of compaction on specified segments

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[Discussion] About syntax of compaction on specified segments

Jin Zhou
Hi, community:

I'm working on PR-1812
( https://github.com/apache/carbondata/pull/1812
<https://github.com/apache/carbondata/pull/1812>  )
which aims to support user specified segments in compaction operation.

After previous discussions, I think there are 3 possible ways to implement
the function:
1) Extending existing SQL syntax of Major and Minor compaciton:
    ALTER TABLE tablename compact 'MAJOR' '1, 2, 3, 4';
    ALTER TABLE tablename compact 'MINOR' '1, 2, 3, 4';

2) Adding support for CARBON_INPUT_SEGMENTS property of Major and Minor
compaciton:
    SET carbon.input.segments.dbname.tablename=1,3;
    ALTER TABLE tablename compact 'MAJOR';

3) Adding a new compaction type and some associated configs, for example,
'CUSTOM' :
    ALTER TABLE tablename compact 'CUSTOM' '1, 2, 3, 4'

I'm grateful for advice from chenliang,ravipesala and gvramana ,detailed
discussion history can be seen on web page:
( https://github.com/apache/carbondata/pull/1812
<https://github.com/apache/carbondata/pull/1812>  )

Now I'm a bit confused and really need your suggestion :)






--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Re: [Discussion] About syntax of compaction on specified segments

xuchuanyin

I think the syntax of segment compaction should be similar with that of other management on segment.
Currently in carbondata, we delete segment using syntax:
```
DELETE FROM TABLE CarbonDatabase.CarbonTable WHERE SEGMENT.ID IN (0,5,8)
```
And
```
DELETE FROM TABLE CarbonDatabase.CarbonTable WHERE SEGMENT.STARTTIME BEFORE '2017-06-01 12:05:06'
```

So, we can imitate the above syntax and get the followings:
```
ALTER TABLE [db_name.]table_name COMPACT 'MINOR/MAJOR' WHERE SEGMENT.ID IN (0,5,8)
```
And
```
ALTER TABLE [db_name.]table_name COMPACT 'MINOR/MAJOR' WHERE SEGMENT.STARTTIME BEFORE '2017-06-01 12:05:06' AND SEGMENT.STARTTIME AFTER '2017-05-01 12:05:06'
```
We can support compact segment by specifying IDs and dates.


-----原始邮件-----
发件人:xuchuanyin <[hidden email]>
发送时间:2018-03-12 20:23:20 (星期一)
收件人: "Jin Zhou" <[hidden email]>
抄送: carbondata <[hidden email]>
主题: Re: [Discussion] About syntax of compaction on specified segments


what does 1/2/3/4 mean in your example? If it is the segment id, probably compact segments by specifying date range is also needed.


FROM MOBILE EMAIL CLIENT


在2018年03月12日 10:31,Jin Zhou 写道:
Hi, community:

I'm working on PR-1812
( https://github.com/apache/carbondata/pull/1812
<https://github.com/apache/carbondata/pull/1812&gt;  )
which aims to support user specified segments in compaction operation.

After previous discussions, I think there are 3 possible ways to implement
the function:
1) Extending existing SQL syntax of Major and Minor compaciton:
   ALTER TABLE tablename compact 'MAJOR' '1, 2, 3, 4';
   ALTER TABLE tablename compact 'MINOR' '1, 2, 3, 4';

2) Adding support for CARBON_INPUT_SEGMENTS property of Major and Minor
compaciton:
   SET carbon.input.segments.dbname.tablename=1,3;
   ALTER TABLE tablename compact 'MAJOR';

3) Adding a new compaction type and some associated configs, for example,
'CUSTOM' :
   ALTER TABLE tablename compact 'CUSTOM' '1, 2, 3, 4'

I'm grateful for advice from chenliang,ravipesala and gvramana ,detailed
discussion history can be seen on web page:
( https://github.com/apache/carbondata/pull/1812
<https://github.com/apache/carbondata/pull/1812&gt;  )

Now I'm a bit confused and really need your suggestion :)






--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] About syntax of compaction on specified segments

xm_zzc
In reply to this post by Jin Zhou
I prefer to use option 3.  Using the new compact type called 'CUSTOM', which
is different from the minor and major, will not make the user confused.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Re: [Discussion] About syntax of compaction on specified segments

xuchuanyin
I think ‘major’ and ‘minor’ is enough to describe compaction, there is no need to add another one, besides 'custom' is somewhat ambiguous.

As it is described in readme,
```
In Major compaction, multiple segments can be merged into one large segment.
User will specify the compaction size until which segments can be merged.
```
The previous (default without condition) major compaction is size based, carbondata choose the segments by size. And for the newly major compaction (with condition), we specify the segments and let carbondata merge them into one large segment.
They are no different in purpose -- merge some segments into larger one, they are only different in selecting segments -- by segment size or by condition. So we don't need an another compaction type.
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] About syntax of compaction on specified segments

bill.zhou
In reply to this post by Jin Zhou
hi community
I prefer the option 3 because of following reason.

1. it is not good idea to change the previously behavior for majar and minor
compaction. Customer who used the carbondata will be confusion. should
compatibility previously feature.
2. customer compaction is different with major and minor for usability,
customer compaction is a high level feature which need customer know
carbondata logical much more.



Jin Zhou wrote

> Hi, community:
>
> I'm working on PR-1812
> ( https://github.com/apache/carbondata/pull/1812
> &lt;https://github.com/apache/carbondata/pull/1812&gt;  )
> which aims to support user specified segments in compaction operation.
>
> After previous discussions, I think there are 3 possible ways to implement
> the function:
> 1) Extending existing SQL syntax of Major and Minor compaciton:
>     ALTER TABLE tablename compact 'MAJOR' '1, 2, 3, 4';
>     ALTER TABLE tablename compact 'MINOR' '1, 2, 3, 4';
>
> 2) Adding support for CARBON_INPUT_SEGMENTS property of Major and Minor
> compaciton:
>     SET carbon.input.segments.dbname.tablename=1,3;
>     ALTER TABLE tablename compact 'MAJOR';
>
> 3) Adding a new compaction type and some associated configs, for example,
> 'CUSTOM' :
>     ALTER TABLE tablename compact 'CUSTOM' '1, 2, 3, 4'
>
> I'm grateful for advice from chenliang,ravipesala and gvramana ,detailed
> discussion history can be seen on web page:
> ( https://github.com/apache/carbondata/pull/1812
> &lt;https://github.com/apache/carbondata/pull/1812&gt;  )
>
> Now I'm a bit confused and really need your suggestion :)
>
>
>
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] About syntax of compaction on specified segments

xuchuanyin
In reply to this post by Jin Zhou
Hi, all:
Here I am to make a conclusion of my opinion and provide option 4.

Option 4:
4) Extending existing SQL syntax of Major and Minor compaciton based on
syntax of delete segment:
    ALTER TABLE tablename COMPACT 'MAJOR' WHERE SEGMENT.ID IN (1,2,3,4)
    ALTER TABLE tablename COMPACT 'MINOR' WHERE SEGMENT.ID IN (1,2,3,4)
    ALTER TABLE tablename COMPACT 'MAJOR' WHERE SEGMENT.STARTTIME BEFORE
'2017-06-01 12:05:06' AND SEGMENT.STARTTIME AFTER '2017-05-01 12:05:06'
    ALTER TABLE tablename COMPACT 'MINOR' WHERE SEGMENT.STARTTIME BEFORE
'2017-06-01 12:05:06' AND SEGMENT.STARTTIME AFTER '2017-05-01 12:05:06'
  Notice: The syntax is slightly different from that of Option1.

The previous (default without condition) major compaction is size based,
carbondata choose the segments by size. And for the newly major compaction
(with condition), we specify the segments and let carbondata merge them into
one large segment.
Actually the previous compaction statement looks like this
    ALTER TABLE tablename COMPACT 'MAJOR' WHERE SEGMENT_SIZE > XXMB
The condition part 'WHERE SEGMENT_SIZE > XXMB' is implicit. However the
condition part in newly compaction statement is explicit.
They are no different in purpose -- merge some segments into larger one,
they are only different in selecting segments -- by segment size or by
condition. So we don't need an another compaction type.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] About syntax of compaction on specified segments

luffy
In reply to this post by Jin Zhou
compaction have major and minor is ok,not need another like custom,i am more
concerned about compaction performance.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] About syntax of compaction on specified segments

manishgupta88
Hi,

I agree with @gvramana <https://github.com/gvramana>

   1. We should *not use* Major/Minor compaction type as they have a
   specific meaning and both are controlled by the system for taking decisions
   whether segment is valid to be compacted or not.
   2. We should *not use* carbon.input.segments.default.seg_compact to set
   the segments to be compacted.
   3. We should introduce a new compaction type in the DDL 'CUSTOM' as
   suggested by @gvramana <https://github.com/gvramana> because it is
   something like force compaction for the given segments as it will not check
   for size and frequency of segments. We can work on using the below syntax
   for custom compaction.

*ALTER TABLE [db_name.]table_name COMPACT 'CUSTOM' WHERE SEGMENT.ID
<http://SEGMENT.ID> IN (0,5,8)*

Once a table is compacted using Custom compaction, then minor compaction
does not hold good for the custom compacted segment. Custom compacted
segment should only participate during major compaction if it satisfies the
major compaction size property.

Regards
Manish Gupta

On Tue, Mar 13, 2018 at 2:55 PM, luffy <[hidden email]> wrote:

> compaction have major and minor is ok,not need another like custom,i am
> more
> concerned about compaction performance.
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] About syntax of compaction on specified segments

Liang Chen
Administrator
Hi

Thank jinzhou started this discussion session.

I also propose to use the proposed solution from manish, not impacts the
current Major and Minor compaction behaviors.

Regards
Liang

manishgupta88 wrote

> Hi,
>
> I agree with @gvramana &lt;https://github.com/gvramana&gt;
>
>    1. We should *not use* Major/Minor compaction type as they have a
>    specific meaning and both are controlled by the system for taking
> decisions
>    whether segment is valid to be compacted or not.
>    2. We should *not use* carbon.input.segments.default.seg_compact to set
>    the segments to be compacted.
>    3. We should introduce a new compaction type in the DDL 'CUSTOM' as
>    suggested by @gvramana &lt;https://github.com/gvramana&gt; because it
> is
>    something like force compaction for the given segments as it will not
> check
>    for size and frequency of segments. We can work on using the below
> syntax
>    for custom compaction.
>
> *ALTER TABLE [db_name.]table_name COMPACT 'CUSTOM' WHERE SEGMENT.ID
> &lt;http://SEGMENT.ID&gt; IN (0,5,8)*
>
> Once a table is compacted using Custom compaction, then minor compaction
> does not hold good for the custom compacted segment. Custom compacted
> segment should only participate during major compaction if it satisfies
> the
> major compaction size property.
>
> Regards
> Manish Gupta
>
> On Tue, Mar 13, 2018 at 2:55 PM, luffy &lt;

> luffy.wang@

> &gt; wrote:
>
>> compaction have major and minor is ok,not need another like custom,i am
>> more
>> concerned about compaction performance.
>>
>>
>>
>> --
>> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
>> n5.nabble.com/
>>





--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/