Hi, community:
I'm working on PR-1812 ( https://github.com/apache/carbondata/pull/1812 <https://github.com/apache/carbondata/pull/1812> ) which aims to support user specified segments in compaction operation. After previous discussions, I think there are 3 possible ways to implement the function: 1) Extending existing SQL syntax of Major and Minor compaciton: ALTER TABLE tablename compact 'MAJOR' '1, 2, 3, 4'; ALTER TABLE tablename compact 'MINOR' '1, 2, 3, 4'; 2) Adding support for CARBON_INPUT_SEGMENTS property of Major and Minor compaciton: SET carbon.input.segments.dbname.tablename=1,3; ALTER TABLE tablename compact 'MAJOR'; 3) Adding a new compaction type and some associated configs, for example, 'CUSTOM' : ALTER TABLE tablename compact 'CUSTOM' '1, 2, 3, 4' I'm grateful for advice from chenliang,ravipesala and gvramana ,detailed discussion history can be seen on web page: ( https://github.com/apache/carbondata/pull/1812 <https://github.com/apache/carbondata/pull/1812> ) Now I'm a bit confused and really need your suggestion :) -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
I think the syntax of segment compaction should be similar with that of other management on segment. Currently in carbondata, we delete segment using syntax: ``` DELETE FROM TABLE CarbonDatabase.CarbonTable WHERE SEGMENT.ID IN (0,5,8) ``` And ``` DELETE FROM TABLE CarbonDatabase.CarbonTable WHERE SEGMENT.STARTTIME BEFORE '2017-06-01 12:05:06' ``` So, we can imitate the above syntax and get the followings: ``` ALTER TABLE [db_name.]table_name COMPACT 'MINOR/MAJOR' WHERE SEGMENT.ID IN (0,5,8) ``` And ``` ALTER TABLE [db_name.]table_name COMPACT 'MINOR/MAJOR' WHERE SEGMENT.STARTTIME BEFORE '2017-06-01 12:05:06' AND SEGMENT.STARTTIME AFTER '2017-05-01 12:05:06' ``` We can support compact segment by specifying IDs and dates. -----原始邮件----- 发件人:xuchuanyin <[hidden email]> 发送时间:2018-03-12 20:23:20 (星期一) 收件人: "Jin Zhou" <[hidden email]> 抄送: carbondata <[hidden email]> 主题: Re: [Discussion] About syntax of compaction on specified segments what does 1/2/3/4 mean in your example? If it is the segment id, probably compact segments by specifying date range is also needed. FROM MOBILE EMAIL CLIENT 在2018年03月12日 10:31,Jin Zhou 写道: Hi, community: I'm working on PR-1812 ( https://github.com/apache/carbondata/pull/1812 <https://github.com/apache/carbondata/pull/1812> ) which aims to support user specified segments in compaction operation. After previous discussions, I think there are 3 possible ways to implement the function: 1) Extending existing SQL syntax of Major and Minor compaciton: ALTER TABLE tablename compact 'MAJOR' '1, 2, 3, 4'; ALTER TABLE tablename compact 'MINOR' '1, 2, 3, 4'; 2) Adding support for CARBON_INPUT_SEGMENTS property of Major and Minor compaciton: SET carbon.input.segments.dbname.tablename=1,3; ALTER TABLE tablename compact 'MAJOR'; 3) Adding a new compaction type and some associated configs, for example, 'CUSTOM' : ALTER TABLE tablename compact 'CUSTOM' '1, 2, 3, 4' I'm grateful for advice from chenliang,ravipesala and gvramana ,detailed discussion history can be seen on web page: ( https://github.com/apache/carbondata/pull/1812 <https://github.com/apache/carbondata/pull/1812> ) Now I'm a bit confused and really need your suggestion :) -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by Jin Zhou
I prefer to use option 3. Using the new compact type called 'CUSTOM', which
is different from the minor and major, will not make the user confused. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
I think ‘major’ and ‘minor’ is enough to describe compaction, there is no need to add another one, besides 'custom' is somewhat ambiguous.
As it is described in readme, ``` In Major compaction, multiple segments can be merged into one large segment. User will specify the compaction size until which segments can be merged. ``` The previous (default without condition) major compaction is size based, carbondata choose the segments by size. And for the newly major compaction (with condition), we specify the segments and let carbondata merge them into one large segment. They are no different in purpose -- merge some segments into larger one, they are only different in selecting segments -- by segment size or by condition. So we don't need an another compaction type. |
In reply to this post by Jin Zhou
hi community
I prefer the option 3 because of following reason. 1. it is not good idea to change the previously behavior for majar and minor compaction. Customer who used the carbondata will be confusion. should compatibility previously feature. 2. customer compaction is different with major and minor for usability, customer compaction is a high level feature which need customer know carbondata logical much more. Jin Zhou wrote > Hi, community: > > I'm working on PR-1812 > ( https://github.com/apache/carbondata/pull/1812 > <https://github.com/apache/carbondata/pull/1812> ) > which aims to support user specified segments in compaction operation. > > After previous discussions, I think there are 3 possible ways to implement > the function: > 1) Extending existing SQL syntax of Major and Minor compaciton: > ALTER TABLE tablename compact 'MAJOR' '1, 2, 3, 4'; > ALTER TABLE tablename compact 'MINOR' '1, 2, 3, 4'; > > 2) Adding support for CARBON_INPUT_SEGMENTS property of Major and Minor > compaciton: > SET carbon.input.segments.dbname.tablename=1,3; > ALTER TABLE tablename compact 'MAJOR'; > > 3) Adding a new compaction type and some associated configs, for example, > 'CUSTOM' : > ALTER TABLE tablename compact 'CUSTOM' '1, 2, 3, 4' > > I'm grateful for advice from chenliang,ravipesala and gvramana ,detailed > discussion history can be seen on web page: > ( https://github.com/apache/carbondata/pull/1812 > <https://github.com/apache/carbondata/pull/1812> ) > > Now I'm a bit confused and really need your suggestion :) > > > > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by Jin Zhou
Hi, all:
Here I am to make a conclusion of my opinion and provide option 4. Option 4: 4) Extending existing SQL syntax of Major and Minor compaciton based on syntax of delete segment: ALTER TABLE tablename COMPACT 'MAJOR' WHERE SEGMENT.ID IN (1,2,3,4) ALTER TABLE tablename COMPACT 'MINOR' WHERE SEGMENT.ID IN (1,2,3,4) ALTER TABLE tablename COMPACT 'MAJOR' WHERE SEGMENT.STARTTIME BEFORE '2017-06-01 12:05:06' AND SEGMENT.STARTTIME AFTER '2017-05-01 12:05:06' ALTER TABLE tablename COMPACT 'MINOR' WHERE SEGMENT.STARTTIME BEFORE '2017-06-01 12:05:06' AND SEGMENT.STARTTIME AFTER '2017-05-01 12:05:06' Notice: The syntax is slightly different from that of Option1. The previous (default without condition) major compaction is size based, carbondata choose the segments by size. And for the newly major compaction (with condition), we specify the segments and let carbondata merge them into one large segment. Actually the previous compaction statement looks like this ALTER TABLE tablename COMPACT 'MAJOR' WHERE SEGMENT_SIZE > XXMB The condition part 'WHERE SEGMENT_SIZE > XXMB' is implicit. However the condition part in newly compaction statement is explicit. They are no different in purpose -- merge some segments into larger one, they are only different in selecting segments -- by segment size or by condition. So we don't need an another compaction type. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by Jin Zhou
compaction have major and minor is ok,not need another like custom,i am more
concerned about compaction performance. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi,
I agree with @gvramana <https://github.com/gvramana> 1. We should *not use* Major/Minor compaction type as they have a specific meaning and both are controlled by the system for taking decisions whether segment is valid to be compacted or not. 2. We should *not use* carbon.input.segments.default.seg_compact to set the segments to be compacted. 3. We should introduce a new compaction type in the DDL 'CUSTOM' as suggested by @gvramana <https://github.com/gvramana> because it is something like force compaction for the given segments as it will not check for size and frequency of segments. We can work on using the below syntax for custom compaction. *ALTER TABLE [db_name.]table_name COMPACT 'CUSTOM' WHERE SEGMENT.ID <http://SEGMENT.ID> IN (0,5,8)* Once a table is compacted using Custom compaction, then minor compaction does not hold good for the custom compacted segment. Custom compacted segment should only participate during major compaction if it satisfies the major compaction size property. Regards Manish Gupta On Tue, Mar 13, 2018 at 2:55 PM, luffy <[hidden email]> wrote: > compaction have major and minor is ok,not need another like custom,i am > more > concerned about compaction performance. > > > > -- > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556. > n5.nabble.com/ > |
Administrator
|
Hi
Thank jinzhou started this discussion session. I also propose to use the proposed solution from manish, not impacts the current Major and Minor compaction behaviors. Regards Liang manishgupta88 wrote > Hi, > > I agree with @gvramana <https://github.com/gvramana> > > 1. We should *not use* Major/Minor compaction type as they have a > specific meaning and both are controlled by the system for taking > decisions > whether segment is valid to be compacted or not. > 2. We should *not use* carbon.input.segments.default.seg_compact to set > the segments to be compacted. > 3. We should introduce a new compaction type in the DDL 'CUSTOM' as > suggested by @gvramana <https://github.com/gvramana> because it > is > something like force compaction for the given segments as it will not > check > for size and frequency of segments. We can work on using the below > syntax > for custom compaction. > > *ALTER TABLE [db_name.]table_name COMPACT 'CUSTOM' WHERE SEGMENT.ID > <http://SEGMENT.ID> IN (0,5,8)* > > Once a table is compacted using Custom compaction, then minor compaction > does not hold good for the custom compacted segment. Custom compacted > segment should only participate during major compaction if it satisfies > the > major compaction size property. > > Regards > Manish Gupta > > On Tue, Mar 13, 2018 at 2:55 PM, luffy < > luffy.wang@ > > wrote: > >> compaction have major and minor is ok,not need another like custom,i am >> more >> concerned about compaction performance. >> >> >> >> -- >> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556. >> n5.nabble.com/ >> -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Free forum by Nabble | Edit this page |