Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor

Classic

List

Threaded

64 messages Options

1234

qiuchenjian-2

[GitHub] carbondata issue #2847: [CARBONDATA-3005]Support Gzip as column compressor

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2847

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1674/

---

qiuchenjian-2

[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

In reply to this post by qiuchenjian-2

Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240135884

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/CompressorFactory.java ---
@@ -35,8 +35,8 @@
private final Map<String, Compressor> allSupportedCompressors = new HashMap<>();

public enum NativeSupportedCompressor {
- SNAPPY("snappy", SnappyCompressor.class),
- ZSTD("zstd", ZstdCompressor.class);
+ SNAPPY("snappy", SnappyCompressor.class), ZSTD("zstd", ZstdCompressor.class), GZIP("gzip",
--- End diff --

Done.

---

qiuchenjian-2

[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

In reply to this post by qiuchenjian-2

Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240147384

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+/**
+ * Codec Class for performing Gzip Compression
+ */
+public class GzipCompressor extends AbstractCompressor {
+
+ @Override public String getName() {
+ return "gzip";
+ }
+
+ /**
+ * This method takes the Byte Array data and Compresses in gzip format
+ *
+ * @param data Data Byte Array passed for compression
+ * @return Compressed Byte Array
+ */
+ private byte[] compressData(byte[] data) {
+ ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+ try {
+ GzipCompressorOutputStream gzipCompressorOutputStream =
+ new GzipCompressorOutputStream(byteArrayOutputStream);
+ try {
+ /**
+ * Below api will write bytes from specified byte array to the gzipCompressorOutputStream
+ * The output stream will compress the given byte array.
+ */
+ gzipCompressorOutputStream.write(data);
+ } catch (IOException e) {
+ throw new RuntimeException("Error during Compression step " + e.getMessage());
--- End diff --

Don't skip the actual exception. Add original exception also as the cause to RunTimeException

---

qiuchenjian-2

[GitHub] carbondata issue #2847: [CARBONDATA-3005]Support Gzip as column compressor

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2847

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1887/

---

qiuchenjian-2

[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

In reply to this post by qiuchenjian-2

Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240157144

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+/**
+ * Codec Class for performing Gzip Compression
+ */
+public class GzipCompressor extends AbstractCompressor {
+
+ @Override public String getName() {
+ return "gzip";
+ }
+
+ /**
+ * This method takes the Byte Array data and Compresses in gzip format
+ *
+ * @param data Data Byte Array passed for compression
+ * @return Compressed Byte Array
+ */
+ private byte[] compressData(byte[] data) {
+ ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+ try {
+ GzipCompressorOutputStream gzipCompressorOutputStream =
+ new GzipCompressorOutputStream(byteArrayOutputStream);
+ try {
+ /**
+ * Below api will write bytes from specified byte array to the gzipCompressorOutputStream
+ * The output stream will compress the given byte array.
+ */
+ gzipCompressorOutputStream.write(data);
+ } catch (IOException e) {
+ throw new RuntimeException("Error during Compression step " + e.getMessage());
--- End diff --

ok added the actual exception.

---

qiuchenjian-2

[GitHub] carbondata issue #2847: [CARBONDATA-3005]Support Gzip as column compressor

In reply to this post by qiuchenjian-2

qiuchenjian-2

[GitHub] carbondata issue #2847: [CARBONDATA-3005]Support Gzip as column compressor

In reply to this post by qiuchenjian-2

qiuchenjian-2

[GitHub] carbondata issue #2847: [CARBONDATA-3005]Support Gzip as column compressor

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2847

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9938/

---

qiuchenjian-2

[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

In reply to this post by qiuchenjian-2

Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240206699

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+/**
+ * Codec Class for performing Gzip Compression
+ */
+public class GzipCompressor extends AbstractCompressor {
+
+ @Override public String getName() {
+ return "gzip";
+ }
+
+ /**
+ * This method takes the Byte Array data and Compresses in gzip format
+ *
+ * @param data Data Byte Array passed for compression
+ * @return Compressed Byte Array
+ */
+ private byte[] compressData(byte[] data) {
+ ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+ try {
+ GzipCompressorOutputStream gzipCompressorOutputStream =
+ new GzipCompressorOutputStream(byteArrayOutputStream);
+ try {
+ /**
+ * Below api will write bytes from specified byte array to the gzipCompressorOutputStream
+ * The output stream will compress the given byte array.
+ */
+ gzipCompressorOutputStream.write(data);
+ } catch (IOException e) {
+ throw new RuntimeException("Error during Compression writing step ", e);
+ } finally {
+ gzipCompressorOutputStream.close();
+ }
+ } catch (IOException e) {
+ throw new RuntimeException("Error during Compression step ", e);
+ }
+ return byteArrayOutputStream.toByteArray();
+ }
+
+ /**
+ * This method takes the Byte Array data and Decompresses in gzip format
+ *
+ * @param data Data Byte Array for Compression
+ * @param offset Start value of Data Byte Array
+ * @param length Size of Byte Array
+ * @return
+ */
+ private byte[] decompressData(byte[] data, int offset, int length) {
+ ByteArrayInputStream byteArrayOutputStream = new ByteArrayInputStream(data, offset, length);
+ ByteArrayOutputStream byteOutputStream = new ByteArrayOutputStream();
+ try {
+ GzipCompressorInputStream gzipCompressorInputStream =
+ new GzipCompressorInputStream(byteArrayOutputStream);
+ byte[] buffer = new byte[1024];
--- End diff --

Instead of fixed 1024, can you observe what is the blocksize (bytes size) gzip operates and use the same value ?

---

qiuchenjian-2

[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

In reply to this post by qiuchenjian-2

Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240208519

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+/**
+ * Codec Class for performing Gzip Compression
+ */
+public class GzipCompressor extends AbstractCompressor {
+
+ @Override public String getName() {
+ return "gzip";
+ }
+
+ /**
+ * This method takes the Byte Array data and Compresses in gzip format
+ *
+ * @param data Data Byte Array passed for compression
+ * @return Compressed Byte Array
+ */
+ private byte[] compressData(byte[] data) {
+ ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
--- End diff --

ByteArrayOutputStream initializes with 32 and copies the data to new byte[] on expansion. Can you use a better initial size to limit the number of copies during expansion. Snappy has a utility (maxCompressedLength) to calculate the same, you check if any gzip libs has similar method. If not we an use based a test with max possible compression ratio.

---

qiuchenjian-2

[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

In reply to this post by qiuchenjian-2

Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240211334

--- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala ---
@@ -252,50 +253,94 @@ class TestLoadDataWithCompression extends QueryTest with BeforeAndAfterEach with
""".stripMargin)
}

- test("test data loading with snappy compressor and offheap") {
+ test("test data loading with different compressors and offheap") {
+ for(comp <- compressors){
+ CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true")
--- End diff --

Should we have UT for enable.unsafe.in.query.processing ture and false ?

---

qiuchenjian-2

[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

In reply to this post by qiuchenjian-2

Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240227006

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+/**
+ * Codec Class for performing Gzip Compression
+ */
+public class GzipCompressor extends AbstractCompressor {
+
+ @Override public String getName() {
+ return "gzip";
+ }
+
+ /**
+ * This method takes the Byte Array data and Compresses in gzip format
+ *
+ * @param data Data Byte Array passed for compression
+ * @return Compressed Byte Array
+ */
+ private byte[] compressData(byte[] data) {
+ ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
--- End diff --

Based on the observations I have initialized the byteArrayOutputStream with size of half of byte buffer, So it reduces the number of resizing of the stream.

---

qiuchenjian-2

[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

In reply to this post by qiuchenjian-2

Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240236269

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java ---
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.DoubleBuffer;
+import java.nio.FloatBuffer;
+import java.nio.IntBuffer;
+import java.nio.LongBuffer;
+import java.nio.ShortBuffer;
+
+import org.apache.carbondata.core.util.ByteUtil;
+
+import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+public class GzipCompressor implements Compressor {
+
+ public GzipCompressor() {
+ }
+
+ @Override public String getName() {
+ return "gzip";
+ }
+
+ /*
+ * Method called for compressing the data and
+ * return a byte array
+ */
+ private byte[] compressData(byte[] data) {
+
+ ByteArrayOutputStream bt = new ByteArrayOutputStream();
+ try {
+ GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt);
+ try {
+ gzos.write(data);
+ } catch (IOException e) {
+ e.printStackTrace();
+ } finally {
+ gzos.close();
+ }
+ } catch (IOException e) {
+ e.printStackTrace();
+ }
+
+ return bt.toByteArray();
+ }
+
+ /*
+ * Method called for decompressing the data and
+ * return a byte array
+ */
+ private byte[] decompressData(byte[] data) {
+
+ ByteArrayInputStream bt = new ByteArrayInputStream(data);
+ ByteArrayOutputStream bot = new ByteArrayOutputStream();
+
+ try {
+ GzipCompressorInputStream gzis = new GzipCompressorInputStream(bt);
+ byte[] buffer = new byte[1024];
+ int len;
+
+ while ((len = gzis.read(buffer)) != -1) {
+ bot.write(buffer, 0, len);
+ }
+
+ } catch (IOException e) {
+ e.printStackTrace();
+ }
+
+ return bot.toByteArray();
+ }
+
+ @Override public byte[] compressByte(byte[] unCompInput) {
+ return compressData(unCompInput);
+ }
+
+ @Override public byte[] compressByte(byte[] unCompInput, int byteSize) {
+ return compressData(unCompInput);
+ }
+
+ @Override public byte[] unCompressByte(byte[] compInput) {
+ return decompressData(compInput);
+ }
+
+ @Override public byte[] unCompressByte(byte[] compInput, int offset, int length) {
+ byte[] data = new byte[length];
+ System.arraycopy(compInput, offset, data, 0, length);
+ return decompressData(data);
+ }
+
+ @Override public byte[] compressShort(short[] unCompInput) {
+ ByteBuffer unCompBuffer = ByteBuffer.allocate(unCompInput.length * ByteUtil.SIZEOF_SHORT);
+ unCompBuffer.asShortBuffer().put(unCompInput);
+ return compressData(unCompBuffer.array());
+ }
+
+ @Override public short[] unCompressShort(byte[] compInput, int offset, int length) {
+ byte[] unCompArray = unCompressByte(compInput, offset, length);
+ ShortBuffer unCompBuffer = ByteBuffer.wrap(unCompArray).asShortBuffer();
+ short[] shorts = new short[unCompArray.length / ByteUtil.SIZEOF_SHORT];
+ unCompBuffer.get(shorts);
+ return shorts;
+ }
+
+ @Override public byte[] compressInt(int[] unCompInput) {
+ ByteBuffer unCompBuffer = ByteBuffer.allocate(unCompInput.length * ByteUtil.SIZEOF_INT);
+ unCompBuffer.asIntBuffer().put(unCompInput);
+ return compressData(unCompBuffer.array());
+ }
+
+ @Override public int[] unCompressInt(byte[] compInput, int offset, int length) {
+ byte[] unCompArray = unCompressByte(compInput, offset, length);
+ IntBuffer unCompBuffer = ByteBuffer.wrap(unCompArray).asIntBuffer();
+ int[] ints = new int[unCompArray.length / ByteUtil.SIZEOF_INT];
+ unCompBuffer.get(ints);
+ return ints;
+ }
+
+ @Override public byte[] compressLong(long[] unCompInput) {
+ ByteBuffer unCompBuffer = ByteBuffer.allocate(unCompInput.length * ByteUtil.SIZEOF_LONG);
+ unCompBuffer.asLongBuffer().put(unCompInput);
+ return compressData(unCompBuffer.array());
+ }
+
+ @Override public long[] unCompressLong(byte[] compInput, int offset, int length) {
+ byte[] unCompArray = unCompressByte(compInput, offset, length);
+ LongBuffer unCompBuffer = ByteBuffer.wrap(unCompArray).asLongBuffer();
+ long[] longs = new long[unCompArray.length / ByteUtil.SIZEOF_LONG];
+ unCompBuffer.get(longs);
+ return longs;
+ }
+
+ @Override public byte[] compressFloat(float[] unCompInput) {
+ ByteBuffer unCompBuffer = ByteBuffer.allocate(unCompInput.length * ByteUtil.SIZEOF_FLOAT);
+ unCompBuffer.asFloatBuffer().put(unCompInput);
+ return compressData(unCompBuffer.array());
+ }
+
+ @Override public float[] unCompressFloat(byte[] compInput, int offset, int length) {
+ byte[] unCompArray = unCompressByte(compInput, offset, length);
+ FloatBuffer unCompBuffer = ByteBuffer.wrap(unCompArray).asFloatBuffer();
+ float[] floats = new float[unCompArray.length / ByteUtil.SIZEOF_FLOAT];
+ unCompBuffer.get(floats);
+ return floats;
+ }
+
+ @Override public byte[] compressDouble(double[] unCompInput) {
+ ByteBuffer unCompBuffer = ByteBuffer.allocate(unCompInput.length * ByteUtil.SIZEOF_DOUBLE);
+ unCompBuffer.asDoubleBuffer().put(unCompInput);
+ return compressData(unCompBuffer.array());
+ }
+
+ @Override public double[] unCompressDouble(byte[] compInput, int offset, int length) {
+ byte[] unCompArray = unCompressByte(compInput, offset, length);
+ DoubleBuffer unCompBuffer = ByteBuffer.wrap(unCompArray).asDoubleBuffer();
+ double[] doubles = new double[unCompArray.length / ByteUtil.SIZEOF_DOUBLE];
+ unCompBuffer.get(doubles);
+ return doubles;
+ }
+
+ @Override public long rawCompress(long inputAddress, int inputSize, long outputAddress)
+ throws IOException {
+ throw new RuntimeException("Not implemented rawUncompress for gzip yet");
+ }
+
+ @Override public long rawUncompress(byte[] input, byte[] output) throws IOException {
+ //gzip api doesnt have rawCompress yet.
--- End diff --

Done.

---

qiuchenjian-2

[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

In reply to this post by qiuchenjian-2

Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240236381

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java ---
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.DoubleBuffer;
+import java.nio.FloatBuffer;
+import java.nio.IntBuffer;
+import java.nio.LongBuffer;
+import java.nio.ShortBuffer;
+
+import org.apache.carbondata.core.util.ByteUtil;
+
+import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+public class GzipCompressor implements Compressor {
+
+ public GzipCompressor() {
+ }
+
+ @Override public String getName() {
+ return "gzip";
+ }
+
+ /*
+ * Method called for compressing the data and
+ * return a byte array
+ */
+ private byte[] compressData(byte[] data) {
+
+ ByteArrayOutputStream bt = new ByteArrayOutputStream();
+ try {
+ GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt);
+ try {
+ gzos.write(data);
+ } catch (IOException e) {
+ e.printStackTrace();
+ } finally {
+ gzos.close();
+ }
+ } catch (IOException e) {
+ e.printStackTrace();
--- End diff --

Done.

---

qiuchenjian-2

[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

In reply to this post by qiuchenjian-2

Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240236462

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java ---
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.DoubleBuffer;
+import java.nio.FloatBuffer;
+import java.nio.IntBuffer;
+import java.nio.LongBuffer;
+import java.nio.ShortBuffer;
+
+import org.apache.carbondata.core.util.ByteUtil;
+
+import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+public class GzipCompressor implements Compressor {
+
+ public GzipCompressor() {
+ }
+
+ @Override public String getName() {
+ return "gzip";
+ }
+
+ /*
+ * Method called for compressing the data and
+ * return a byte array
+ */
+ private byte[] compressData(byte[] data) {
+
+ ByteArrayOutputStream bt = new ByteArrayOutputStream();
+ try {
+ GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt);
+ try {
+ gzos.write(data);
+ } catch (IOException e) {
+ e.printStackTrace();
+ } finally {
+ gzos.close();
+ }
+ } catch (IOException e) {
+ e.printStackTrace();
+ }
+
+ return bt.toByteArray();
+ }
+
+ /*
+ * Method called for decompressing the data and
+ * return a byte array
+ */
+ private byte[] decompressData(byte[] data) {
+
+ ByteArrayInputStream bt = new ByteArrayInputStream(data);
+ ByteArrayOutputStream bot = new ByteArrayOutputStream();
+
+ try {
+ GzipCompressorInputStream gzis = new GzipCompressorInputStream(bt);
+ byte[] buffer = new byte[1024];
+ int len;
+
+ while ((len = gzis.read(buffer)) != -1) {
+ bot.write(buffer, 0, len);
+ }
+
+ } catch (IOException e) {
+ e.printStackTrace();
--- End diff --

Done.

---

qiuchenjian-2

[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

In reply to this post by qiuchenjian-2

Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240236819

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java ---
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.DoubleBuffer;
+import java.nio.FloatBuffer;
+import java.nio.IntBuffer;
+import java.nio.LongBuffer;
+import java.nio.ShortBuffer;
+
+import org.apache.carbondata.core.util.ByteUtil;
+
+import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+public class GzipCompressor implements Compressor {
+
+ public GzipCompressor() {
+ }
+
+ @Override public String getName() {
+ return "gzip";
+ }
+
+ /*
+ * Method called for compressing the data and
+ * return a byte array
+ */
+ private byte[] compressData(byte[] data) {
+
+ ByteArrayOutputStream bt = new ByteArrayOutputStream();
+ try {
+ GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt);
+ try {
+ gzos.write(data);
+ } catch (IOException e) {
+ e.printStackTrace();
+ } finally {
+ gzos.close();
+ }
+ } catch (IOException e) {
+ e.printStackTrace();
+ }
+
+ return bt.toByteArray();
+ }
+
+ /*
+ * Method called for decompressing the data and
+ * return a byte array
+ */
+ private byte[] decompressData(byte[] data) {
+
+ ByteArrayInputStream bt = new ByteArrayInputStream(data);
+ ByteArrayOutputStream bot = new ByteArrayOutputStream();
+
+ try {
+ GzipCompressorInputStream gzis = new GzipCompressorInputStream(bt);
+ byte[] buffer = new byte[1024];
+ int len;
+
+ while ((len = gzis.read(buffer)) != -1) {
+ bot.write(buffer, 0, len);
+ }
+
+ } catch (IOException e) {
+ e.printStackTrace();
+ }
+
+ return bot.toByteArray();
--- End diff --

Similar to ByteArrayOutputStream.close() reason mentioned above.

---

qiuchenjian-2

[GitHub] carbondata issue #2847: [CARBONDATA-3005]Support Gzip as column compressor

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2847

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1686/

---

qiuchenjian-2

[GitHub] carbondata issue #2847: [CARBONDATA-3005]Support Gzip as column compressor

In reply to this post by qiuchenjian-2

qiuchenjian-2

[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

In reply to this post by qiuchenjian-2

Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240247558

--- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala ---
@@ -252,50 +253,94 @@ class TestLoadDataWithCompression extends QueryTest with BeforeAndAfterEach with
""".stripMargin)
}

- test("test data loading with snappy compressor and offheap") {
+ test("test data loading with different compressors and offheap") {
+ for(comp <- compressors){
+ CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true")
--- End diff --

By default for gzip/zstd, it's false. So UT for this scenario is not required.

---

qiuchenjian-2

[GitHub] carbondata issue #2847: [CARBONDATA-3005]Support Gzip as column compressor

In reply to this post by qiuchenjian-2

1234