[GitHub] [carbondata] marchpure opened a new pull request #3607: [CARBONDATA-3670] Support compress offheap columnpage directly, avoding a copy of data from offhead to heap when compressed.

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3607: [CARBONDATA-3670] Support compress offheap data in columnpage directly, avoding a copy of data from offhead to heap before compressed.

GitBox
ajantha-bhat commented on a change in pull request #3607: [CARBONDATA-3670] Support compress offheap data in columnpage directly, avoding a copy of data from offhead to heap before compressed.
URL: https://github.com/apache/carbondata/pull/3607#discussion_r376873364
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/dimension/legacy/ComplexDimensionIndexCodec.java
 ##########
 @@ -46,12 +47,13 @@ public ColumnPageEncoder createEncoder(Map<String, String> parameter) {
     return new IndexStorageEncoder() {
       @Override
       void encodeIndexStorage(ColumnPage inputPage) {
-        BlockIndexerStorage<byte[][]> indexStorage =
-            new BlockIndexerStorageForShort(inputPage.getByteArrayPage(), false, false, false);
-        byte[] flattened = ByteUtil.flatten(indexStorage.getDataPage());
+        BlockIndexerStorage<ByteBuffer[]> indexStorage =
+            new BlockIndexerStorageForShort(inputPage.getByteBufferArrayPage(false),
+                false, false, false);
+        ByteBuffer flattened = ByteUtil.flatten(indexStorage.getDataPage());
         Compressor compressor = CompressorFactory.getInstance().getCompressor(
             inputPage.getColumnCompressorName());
-        byte[] compressed = compressor.compressByte(flattened);
+        byte[] compressed = ByteUtil.byteBufferToBytes(compressor.compressByte(flattened));
 
 Review comment:
   I think we have converted 2D byte array to bytebuffer, to support compression directly on the byte buffer.
   But now we have to convert to byte[] again !, possible to keep bytebuffer itself ?
   
   so compression gc might have reduced, but decompression gc might have increased now. Please compare the before and after performance and memory usage.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3607: [CARBONDATA-3670] Support compress offheap data in columnpage directly, avoding a copy of data from offhead to heap before compressed.

GitBox
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3607: [CARBONDATA-3670] Support compress offheap data in columnpage directly, avoding a copy of data from offhead to heap before compressed.
URL: https://github.com/apache/carbondata/pull/3607#discussion_r376873977
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/dimension/legacy/DictDimensionIndexCodec.java
 ##########
 @@ -46,18 +47,33 @@ public String getName() {
   public ColumnPageEncoder createEncoder(Map<String, String> parameter) {
     return new IndexStorageEncoder() {
       @Override
-      void encodeIndexStorage(ColumnPage inputPage) {
-        BlockIndexerStorage<byte[][]> indexStorage;
-        byte[][] data = inputPage.getByteArrayPage();
+      void encodeIndexStorage(ColumnPage input) {
+        BlockIndexerStorage<ByteBuffer[]> indexStorage;
+        boolean isDictionary = input.isLocalDictGeneratedPage();
+
+        // if need to build invertIndex or RLE, the columnpage should to be organized in Row,
+        // in the other words, we get the data of columnpage as an array, in which each element
+        // presenets a row. But if no need to build both invertIndex and RLE, it will increase
+        // extra overhead, considering data in columnpage was already stored as flattened data,
+        // and the compression is also on flattened  data, to organized data in ROW is actually
+        // increase the overheadof "Expand" and "Flatten" with on invertIndex and RLE.
+        // Overall, isFlatted presents do we flatten the data? if need to build invertIndex or RLE,
+        // isFlattened is set to ture, otherwise, isFlattened is set to false.
+        boolean isFlattened = !isInvertedIndex && !isDictionary;
+
+        // when isFlattened is true, data[0] is the flattened data of the columnpage.
+        // when isFlattened is false, data[i] is the ith row of the columnpage.
+        ByteBuffer[] data = input.getByteBufferArrayPage(isFlattened);
         if (isInvertedIndex) {
-          indexStorage = new BlockIndexerStorageForShort(data, true, false, isSort);
+          indexStorage = new BlockIndexerStorageForShort(data, isDictionary, !isDictionary, isSort);
 
 Review comment:
   Not a good practice to pass same flag with complementary values. Hardcode is ok or introduce new variable

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3607: [CARBONDATA-3670] Support compress offheap data in columnpage directly, avoding a copy of data from offhead to heap before compressed.

GitBox
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3607: [CARBONDATA-3670] Support compress offheap data in columnpage directly, avoding a copy of data from offhead to heap before compressed.
URL: https://github.com/apache/carbondata/pull/3607#discussion_r376874268
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/dimension/legacy/DirectDictDimensionIndexCodec.java
 ##########
 @@ -46,18 +47,33 @@ public String getName() {
   public ColumnPageEncoder createEncoder(Map<String, String> parameter) {
     return new IndexStorageEncoder() {
       @Override
-      void encodeIndexStorage(ColumnPage inputPage) {
-        BlockIndexerStorage<byte[][]> indexStorage;
-        byte[][] data = inputPage.getByteArrayPage();
+      void encodeIndexStorage(ColumnPage input) {
+        BlockIndexerStorage<ByteBuffer[]> indexStorage;
+        boolean isDictionary = input.isLocalDictGeneratedPage();
+
+        // if need to build invertIndex or RLE, the columnpage should to be organized in Row,
+        // in the other words, we get the data of columnpage as an array, in which each element
+        // presenets a row. But if no need to build both invertIndex and RLE, it will increase
+        // extra overhead, considering data in columnpage was already stored as flattened data,
+        // and the compression is also on flattened  data, to organized data in ROW is actually
+        // increase the overheadof "Expand" and "Flatten" with on invertIndex and RLE.
+        // Overall, isFlatted presents do we flatten the data? if need to build invertIndex or RLE,
+        // isFlattened is set to ture, otherwise, isFlattened is set to false.
+        boolean isFlattened = !isInvertedIndex && !isDictionary;
+
+        // when isFlattened is true, data[0] is the flattened data of the columnpage.
+        // when isFlattened is false, data[i] is the ith row of the columnpage.
+        ByteBuffer[] data = input.getByteBufferArrayPage(isFlattened);
         if (isInvertedIndex) {
-          indexStorage = new BlockIndexerStorageForShort(data, false, false, isSort);
+          indexStorage = new BlockIndexerStorageForShort(data, isDictionary, !isDictionary, isSort);
 
 Review comment:
   same as above

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on issue #3607: [CARBONDATA-3670] Support compress offheap data in columnpage directly, avoding a copy of data from offhead to heap before compressed.

GitBox
In reply to this post by GitBox
ajantha-bhat commented on issue #3607: [CARBONDATA-3670] Support compress offheap data in columnpage directly, avoding a copy of data from offhead to heap before compressed.
URL: https://github.com/apache/carbondata/pull/3607#issuecomment-609595628
 
 
   I think this was already handled in @jackylk 's #3638
   
   Please check and close the PR if handled.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure closed pull request #3607: [CARBONDATA-3670] Support compress offheap data in columnpage directly, avoding a copy of data from offhead to heap before compressed.

GitBox
In reply to this post by GitBox
marchpure closed pull request #3607: [CARBONDATA-3670] Support compress offheap data in columnpage directly, avoding a copy of data from offhead to heap before compressed.
URL: https://github.com/apache/carbondata/pull/3607
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
12