Posted by
xuchuanyin on
Apr 04, 2019; 11:37am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSSION-Add-new-compaction-type-for-compacting-delta-data-file-tp76597p76703.html
emm, eliminating delta files to enhance query performance is quite reasonable
and compaction is a candidate for it. However I have some questions about
this, maybe they will help in your design.
Q1:
A segment with delta files means there are some UD(update/delete) operations
on this segment before, which means there will still be some UD in the
future. So, is it worth conpacting this segment?
Also please keep in mind that UD operations will be blocked if the
compaction is going on.
Q2:
I feel there may be too many kinds of compaction in carbondata...
What if in the further I want another compaction that can merge smaller
carbondata file into larger ones? Will we add another kind of compaction? I
think it's time for us to consider extensibility for the further while
proposing this feature.
Q3:
Currently all kinds of compactions are using the query procedure to rewrite
all the records for the related segments.
Suppose we have a segment with 100 carbondata files and we only delete one
record in this segment. The
penalty of rewriting all the records for this segment is heavy.
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/