Login  Register

Re: Enhancement on compaction performance

Posted by Jacky Li on Nov 08, 2018; 9:16am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Enhancement-on-compaction-performance-tp67996p68033.html

Hi Xuchuanyin,

This feature is great for compaction. I wonder do you observe more memory is used since it prefetch data in the memory? Do you have any number?

Regards,
Jacky

> 在 2018年11月7日,下午11:54,xuchuanyin <[hidden email]> 写道:
>
> Hi all:
> I am raising a PR to enhance the performance of compaction. The PR number is #2906.
>
> Based on my experiments using about 72GB LineItem data ( in 100GB TPCH data), I got the following results.
>
> Code Branch Prefetch Batch Size (default 100) Load1 (s) Load2 (s) Load3 (s) Compact 3 Loads (s) Time Reduced
> master NA 100 447.4 445.9 450.1 661.3 Base Line
> master NA 32000 441.5 454.4 456.8 641.2 +3.0%
> PR2906 enable 100 445.3 450.2 445.3 411.8 +37.7%
> PR2906 enable 32000 438.7 446.8 441.8 333.1 +49.6%
> PR2906 disable 100 458.1 459.4 450.9 659.5 +0.3%
> PR2906 disable 32000 472.0 446.8 457.1 654.5 +1.0%
> Note: These tests are under spark-2.2 version
>
> The results show that compaction performance is almost doubled if configured properly.
> It also shows even if this feature is disabled, the compaction performance still not decrease.
>
> So here:
>
> 1. I do want to make this feature ‘enabled’ by default.
>
> 2. Besides, I’d want the others in the community also test this feature and check whether we can benefit from this feature.
>
> Any feedback is welcome.
>
>