Login  Register

Re: Proposal to integrate QATCodec into Carbondata

Posted by Jacky Li on Nov 01, 2018; 12:13pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Proposal-to-integrate-QATCodec-into-Carbondata-tp64916p67316.html

Hi,

Good to know about QATCodec. I have a quick question. Is the QATCodec an independent compression/decompression library or it depends on any hardware to achieve the performance improvement you have mentioned?

Is there any link for QATCodec project or source code?

Regards,
Jacky

> 在 2018年10月12日,上午10:40,Xu, Cheng A <[hidden email]> 写道:
>
> Hi all
> I want to make a proposal to support QATCodec [1] into CarbonData. QAT Codec project provides compression and decompression library for Apache Hadoop/Spark to make use of the Intel(r) QuickAssist Technology (Abbrev. QAT) [2] for compression/decompression. This project has been open source this year as well as the underlying native dependencies - QATZip. And users can install the underlying native dependencies using linux package-management utility (e.g. Yum for Centos). This projects have two major benefits:
> 1) A wide ecosystem support
> Now it supports Hadoop & Spark directly by implementing Hadoop & Spark de/compression API and also provides patches to integrate with Parquet and ORC-Hive.
> 2) High performance and space efficiency
> We measured the performance and compression ratio of QATCodec in different workloads comparing against Snappy.
> For the sort workload (input, intermediate data, output are all compression-enabled, 3TB data scale, 5 workers, 2 replica for data) with Map Reduce, using QATCodec brings 7.29% performance gain and 7.5% better compression ratio. For the sort workload (input and intermediate data are compression-enabled, 3TB data scale) with Spark, it brings 14.3% performance gain, 7.5% better compression ratio. Also we measured in Hive on MR with TPCx-BB workload [3] (3TB data scale), it brings 12.98% performance gain, 13.65% better compression ratio.
> Regards to the hardware requirement, current implementation supports falling-back mechanism to software implementation at the absent of QAT device.
> Now Carbondata supports two compression codec: Zstd and Snappy. I think it will bring the benefit to the users to have an extra compression option with hardware acceleration.
>
> Please feel free to share your comments on this proposal.
>
>
> [1] https://github.com/intel-hadoop/IntelQATCodec
> [2] https://01.org/zh/intel-quickassist-technology
> [3] http://www.tpc.org/tpcx-bb/default.asp
>
> Best Regards
> Ferdinand Xu
>
>