Discussion: change default compressor to ZSTD

Posted by Jacky Li on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-change-default-compressor-to-ZSTD-tp91152.html

Hi,


I compared snappy and zstd compressor using TPCH for carbondata.


For TPCH lineitem table:
carbon-zstdcarbon-snappy
loading (s)5351
size795MB1.2GB

TPCH-query:
Q14.2898.29
Q212.60912.986
Q314.90214.458
Q46.2765.954
Q523.14721.946
Q61.120.945
Q723.01728.007
Q814.55415.077
Q928.47227.473
Q1024.06724.682
Q113.3213.79
Q125.3115.185
Q1314.0811.84
Q142.2622.087
Q155.4964.772
Q1629.91929.833
Q177.0187.057
Q1817.36717.795
Q192.9312.865
Q2011.34710.937
Q2126.41628.414
Q225.9236.311
sum283.844290.704


As you can see, after using zstd, table size is 33% reduced comparing to snappy. And the data loading and query time difference is negligible. So I suggest to change the default compressor in carbondata from snappy to zstd.


To change the default compressor, we need to:
1. append the compressor name in the carbondata file name. So that from the file name user can know what compressor is used.
For example, file name will be changed from
 part-0-0_batchno0-0-0-1580982686749.carbondata to  part-0-0_batchno0-0-0-1580982686749.snappy.carbondata or  part-0-0_batchno0-0-0-1580982686749.zstd.carbondata


2. Change the compressor constant in CarbonCommonConstaint.java file to use zstd as default compressor


What do you think?


Regards,
Jacky