Hi community,
Carbondata currently support two types of compaction: Minor and Major compaction. CarbonData will do major compaction according to the user defined segment size. But which segments to be merged are transparent to users. We plan to extend major compaction to support user specified segments, this will be useful in cases below: 1) we can precisely control which part of table to be merged when table is very large. 2) each table can has its own compaction strategy which controlled by user app. the proposed syntax: ALTER TABLE [db_name].table_name COMPACT [SEGMENT seg_id1,seg_id2] 'MAJOR' in which [SEGMENT seg_id1,seg_id2] is optional and compatible with original syntax. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
+1, sounds good about this feature.
-- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Administrator
|
In reply to this post by Jin Zhou
Hi Jin Zhou
Thanks for starting this discussion. 1. For your first proposal : Currently , segment is the system internal concept, not expose to outside. Can you provide what exact problems do you encounter? we can find the alternative solution for your problems. ---------------------------------------------------------------------------------------- 1) we can precisely control which part of table to be merged when table is very large. 2. For your second proposal, my comment is +1, agree. can you please create an apache jira for this ? We would like to invite you to participate in implementing this feature together :) ----------------------------------------------------------------------------------------- 2) each table can has its own compaction strategy which controlled by user app. Regards Liang Jin Zhou wrote > Hi community, > Carbondata currently support two types of compaction: Minor and Major > compaction. > CarbonData will do major compaction according to the user defined segment > size. But which segments to be merged are transparent to users. > We plan to extend major compaction to support user specified segments, > this > will be useful in cases below: > 1) we can precisely control which part of table to be merged when table is > very large. > 2) each table can has its own compaction strategy which controlled by user > app. > > the proposed syntax: > ALTER TABLE [db_name].table_name COMPACT [SEGMENT seg_id1,seg_id2] 'MAJOR' > in which [SEGMENT seg_id1,seg_id2] is optional and compatible with > original > syntax. > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
@Liang Chen, thank you for your reply.
After seriously thinking about your suggestion, I also think the two problems should be considered separately. For problem 2, User specified compaction segments is not a good solution indeed. I am glad to do some work for this. For problem 1, I agree with you that segment is not proper to be exposed to most users in standard APIs because segment is a internal concept to some extent. But as we have segment management commands like "show segments for table" and "alter table compact", it seems that we can not call it a completely internal concept. So I think maybe we can support user specified segments only in management functions like compaction and take it as a hidden advanced usage which is not recommended in general cases. Regards Jin Zhou -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Administrator
|
Hi Jin Zhou
OK, Thanks for your proposal. can you raise one PRs to support the two features? Regards Liang Jin Zhou wrote > @Liang Chen, thank you for your reply. > > > After seriously thinking about your suggestion, I also think the two > problems should be considered separately. > For problem 2, User specified compaction segments is not a good solution > indeed. I am glad to do some work for this. > > For problem 1, I agree with you that segment is not proper to be exposed > to > most users in standard APIs because segment is a internal concept to some > extent. > But as we have segment management commands like "show segments for table" > and "alter table compact", it seems that we can not call it a completely > internal concept. > So I think maybe we can support user specified segments only in management > functions like compaction and take it as a hidden advanced usage which is > not recommended in general cases. > > Regards > Jin Zhou > > > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Free forum by Nabble | Edit this page |