http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-change-default-compressor-to-ZSTD-tp91152p92131.html
Then for PR3606, I will only add the compressor name to the file name but not changing the default compressor to ZSTD.
> 2020年2月20日 下午12:52,Ajantha Bhat <
[hidden email]> 写道:
>
> Hi Jacky and Ravindra,
>
> we have tested ZSTD vs snappy again with the latest code in 3 node spark
> 2.3 cluster on HDFS with TPCH 500 GB data.
> Below is the summary
>
> *1. ZSTD store is 28.8% smaller compared to snappy*
> *2. Overall query time is degraded by 18.35% in ZSTD compared to snappy*
> *3. Load time in ZSTD has negligible degradation of 0.7 % compared to
> snappy*
>
> Based on this, I guess we cannot use ZSTD as default due to huge
> degradation in query time.
>
> Thanks,
> Ajantha
>
>
>
>
> On Fri, Feb 7, 2020 at 4:54 PM Ravindra Pesala <
[hidden email]>
> wrote:
>
>> Hi Jacky,
>>
>> As per the original PR
>>
https://github.com/apache/carbondata/pull/2628 , query performance got
>> decreased by 20% ~ 50% compared to snappy. So I am concerned about the
>> performance. Please better have a proper tpch performance report on the
>> regular cluster like we do for every version and decide based on that.
>>
>> Regards,
>> Ravindra.
>>
>> On Fri, 7 Feb 2020 at 10:40 AM, Jacky Li <
[hidden email]> wrote:
>>
>>> Hi Ajantha,
>>>
>>>
>>> Yes, decoder will use the compressorName stored in ChunkCompressionMeta
>>> from the file header,
>>> but I think it is better to put it in the name so that user can know the
>>> compressor in the shell without reading it by launching engine.
>>>
>>>
>>> In spark, for parquet/orc the file name written
>>> is: part-00115-e2758995-4b10-4bd2-bf15-b4c176e587fe-c000.snappy.orc
>>>
>>>
>>> In PR3606, I will handle the compatibility.
>>>
>>>
>>> Regards,
>>> Jacky
>>>
>>>
>>> ------------------ 原始邮件 ------------------
>>> 发件人: "Ajantha Bhat"<
[hidden email]>;
>>> 发送时间: 2020年2月6日(星期四) 晚上11:51
>>> 收件人: "dev"<
[hidden email]>;
>>>
>>> 主题: Re: Discussion: change default compressor to ZSTD
>>>
>>>
>>>
>>> Hi,
>>>
>>> 33% is huge a reduction in store size. If there is negligible difference
>> in
>>> load and query time, we should definitely go for it.
>>>
>>> And does user really need to know about what compression is used ? change
>>> in file name may be need to handle compatibility.
>>> Already thrift *FileHeader, ChunkCompressionMeta* is storing the
>> compressor
>>> name. query time decoding can be based on this.
>>>
>>> Thanks,
>>> Ajantha
>>>
>>>
>>> On Thu, Feb 6, 2020 at 4:27 PM Jacky Li <
[hidden email]> wrote:
>>>
>>> > Hi,
>>> >
>>> >
>>> > I compared snappy and zstd compressor using TPCH for carbondata.
>>> >
>>> >
>>> > For TPCH lineitem table:
>>> > carbon-zstdcarbon-snappy
>>> > loading (s)5351
>>> > size795MB1.2GB
>>> >
>>> > TPCH-query:
>>> > Q14.2898.29
>>> > Q212.60912.986
>>> > Q314.90214.458
>>> > Q46.2765.954
>>> > Q523.14721.946
>>> > Q61.120.945
>>> > Q723.01728.007
>>> > Q814.55415.077
>>> > Q928.47227.473
>>> > Q1024.06724.682
>>> > Q113.3213.79
>>> > Q125.3115.185
>>> > Q1314.0811.84
>>> > Q142.2622.087
>>> > Q155.4964.772
>>> > Q1629.91929.833
>>> > Q177.0187.057
>>> > Q1817.36717.795
>>> > Q192.9312.865
>>> > Q2011.34710.937
>>> > Q2126.41628.414
>>> > Q225.9236.311
>>> > sum283.844290.704
>>> >
>>> >
>>> > As you can see, after using zstd, table size is 33% reduced
>> comparing
>>> to
>>> > snappy. And the data loading and query time difference is
>> negligible.
>>> So I
>>> > suggest to change the default compressor in carbondata from snappy
>> to
>>> zstd.
>>> >
>>> >
>>> > To change the default compressor, we need to:
>>> > 1. append the compressor name in the carbondata file name. So that
>>> from
>>> > the file name user can know what compressor is used.
>>> > For example, file name will be changed from
>>> > &nbsp;part-0-0_batchno0-0-0-1580982686749.carbondata
>>> >
>>>
>> to&nbsp;&nbsp;part-0-0_batchno0-0-0-1580982686749.snappy.carbondata
>>> >
>>> or&nbsp;&nbsp;part-0-0_batchno0-0-0-1580982686749.zstd.carbondata
>>> >
>>> >
>>> > 2. Change the compressor constant in CarbonCommonConstaint.java file
>>> to
>>> > use zstd as default compressor
>>> >
>>> >
>>> > What do you think?
>>> >
>>> >
>>> > Regards,
>>> > Jacky
>>
>> --
>> Thanks & Regards,
>> Ravi
>>