Proposal to integrate QATCodec into Carbondata
Posted by
Xu, Cheng A on
Oct 12, 2018; 2:40am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Proposal-to-integrate-QATCodec-into-Carbondata-tp64916p64917.html
Hi all
I want to make a proposal to support QATCodec [1] into CarbonData. QAT Codec project provides compression and decompression library for Apache Hadoop/Spark to make use of the Intel(r) QuickAssist Technology (Abbrev. QAT) [2] for compression/decompression. This project has been open source this year as well as the underlying native dependencies - QATZip. And users can install the underlying native dependencies using linux package-management utility (e.g. Yum for Centos). This projects have two major benefits:
1) A wide ecosystem support
Now it supports Hadoop & Spark directly by implementing Hadoop & Spark de/compression API and also provides patches to integrate with Parquet and ORC-Hive.
2) High performance and space efficiency
We measured the performance and compression ratio of QATCodec in different workloads comparing against Snappy.
For the sort workload (input, intermediate data, output are all compression-enabled, 3TB data scale, 5 workers, 2 replica for data) with Map Reduce, using QATCodec brings 7.29% performance gain and 7.5% better compression ratio. For the sort workload (input and intermediate data are compression-enabled, 3TB data scale) with Spark, it brings 14.3% performance gain, 7.5% better compression ratio. Also we measured in Hive on MR with TPCx-BB workload [3] (3TB data scale), it brings 12.98% performance gain, 13.65% better compression ratio.
Regards to the hardware requirement, current implementation supports falling-back mechanism to software implementation at the absent of QAT device.
Now Carbondata supports two compression codec: Zstd and Snappy. I think it will bring the benefit to the users to have an extra compression option with hardware acceleration.
Please feel free to share your comments on this proposal.
[1]
https://github.com/intel-hadoop/IntelQATCodec[2]
https://01.org/zh/intel-quickassist-technology[3]
http://www.tpc.org/tpcx-bb/default.aspBest Regards
Ferdinand Xu