Login  Register

Enhancement on compaction performance

Posted by xuchuanyin on Nov 07, 2018; 3:54pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Enhancement-on-compaction-performance-tp67996.html

Hi all:
I am raising a PR to enhance the performance of compaction. The PR number is #2906.

Based on my experiments using about 72GB LineItem data ( in 100GB TPCH data), I got the following results.

Code Branch Prefetch Batch Size (default 100) Load1 (s) Load2 (s) Load3 (s) Compact 3 Loads (s) Time Reduced
master NA 100 447.4 445.9 450.1 661.3 Base Line
master NA 32000 441.5 454.4 456.8 641.2 +3.0%
PR2906 enable 100 445.3 450.2 445.3 411.8 +37.7%
PR2906 enable 32000 438.7 446.8 441.8 333.1 +49.6%
PR2906 disable 100 458.1 459.4 450.9 659.5 +0.3%
PR2906 disable 32000 472.0 446.8 457.1 654.5 +1.0%
Note: These tests are under spark-2.2 version

The results show that compaction performance is almost doubled if configured properly.
It also shows even if this feature is disabled, the compaction performance still not decrease.

So here:

1. I do want to make this feature ‘enabled’ by default.

2. Besides, I’d want the others in the community also test this feature and check whether we can benefit from this feature.

Any feedback is welcome.