Login  Register

[GitHub] [carbondata] Zhangshunyu opened a new pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction

Posted by GitBox on Nov 24, 2020; 4:02am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/GitHub-carbondata-Zhangshunyu-opened-a-new-pull-request-4020-CARBONDATA-4054-Support-data-size-contrn-tp103477.html


Zhangshunyu opened a new pull request #4020:
URL: https://github.com/apache/carbondata/pull/4020


    ### Why is this PR needed?
    Currentlly, minor compaction only consider the num of segments and major
   compaction only consider the SUM size of segments, but consider a scenario
   that the user want to use minor compaction by the num of segments but he
   dont want to merge the segment whose datasize larger the threshold for
   example 2GB, as it is no need to merge so much big segment and it is time
   costly.
   
    ### What changes were proposed in this PR?
   add a parameter to control the threshold of segment included
   in minor compaction, so that the user can specify the segment not included
   in minor compaction once the datasize exeed the threshold, system level and table level can be set, and if not set the use default
   value.
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - Yes
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]