[GitHub] [carbondata] kevinjmh opened a new pull request #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kevinjmh opened a new pull request #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage

GitBox
kevinjmh opened a new pull request #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage
URL: https://github.com/apache/carbondata/pull/3649
 
 
    ### Why is this PR needed?
   The initial purpose of this PR was to remove the conversion between `byte[][]` and `List` when applying RLE on datapage in `BlockIndexerStorageForNoInvertedIndexForShort`, especially case when rle won't give any benefit. After checking other subclass of `BlockIndexerStorage` whether has similar problem, we found many duplicate codes and clean them up also.
   
    ### What changes were proposed in this PR?
   - avoid some conversion between `byte[][]` and `List`
   - refactor classes of `BlockIndexerStorage` to remove duplicate codes
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage

GitBox
CarbonDataQA1 commented on issue #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage
URL: https://github.com/apache/carbondata/pull/3649#issuecomment-593278000
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2262/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage
URL: https://github.com/apache/carbondata/pull/3649#issuecomment-593278198
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/561/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage
URL: https://github.com/apache/carbondata/pull/3649#issuecomment-593290871
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/563/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage
URL: https://github.com/apache/carbondata/pull/3649#issuecomment-593325511
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2265/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage
URL: https://github.com/apache/carbondata/pull/3649#discussion_r386783042
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorage.java
 ##########
 @@ -50,7 +83,7 @@
    *
    * @param rowIds
    */
-  protected Map<String, short[]> rleEncodeOnRowId(short[] rowIds) {
+  protected void rleEncodeOnRowId(short[] rowIds) {
 
 Review comment:
   since you are modifying this, can you change the function name to a more intuitive name?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage
URL: https://github.com/apache/carbondata/pull/3649#discussion_r386783445
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForNoInvertedIndexForShort.java
 ##########
 @@ -17,85 +17,17 @@
 
 package org.apache.carbondata.core.datastore.columnar;
 
-import java.util.ArrayList;
-import java.util.List;
-
-import org.apache.carbondata.core.constants.CarbonCommonConstants;
-import org.apache.carbondata.core.util.ByteUtil;
-
 /**
  * Below class will be used to for no inverted index
+ * Only try applying RLE on data
  */
 public class BlockIndexerStorageForNoInvertedIndexForShort extends BlockIndexerStorage<byte[][]> {
 
 Review comment:
   Please change class name to a more understandable one

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage
URL: https://github.com/apache/carbondata/pull/3649#discussion_r386783589
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java
 ##########
 @@ -17,38 +17,25 @@
 
 package org.apache.carbondata.core.datastore.columnar;
 
-import java.util.ArrayList;
 import java.util.Arrays;
-import java.util.List;
-import java.util.Map;
-
-import org.apache.carbondata.core.constants.CarbonCommonConstants;
-import org.apache.carbondata.core.util.ByteUtil;
 
+/**
+ * Only try applying RLE on rowId and data
+ */
 public class BlockIndexerStorageForShort extends BlockIndexerStorage<byte[][]> {
 
 Review comment:
   Please change class name to a more understandable one

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage
URL: https://github.com/apache/carbondata/pull/3649#discussion_r386783634
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForNoDictionary.java
 ##########
 @@ -18,18 +18,14 @@
 package org.apache.carbondata.core.datastore.columnar;
 
 import java.util.Arrays;
-import java.util.Map;
 
 import org.apache.carbondata.core.metadata.datatype.DataType;
 
+/**
+ * Only try applying RLE on rowId
+ */
 public class BlockIndexerStorageForNoDictionary extends BlockIndexerStorage<Object[]> {
 
 Review comment:
   Please change class name to a more understandable one

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage
URL: https://github.com/apache/carbondata/pull/3649#issuecomment-595749651
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/652/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage
URL: https://github.com/apache/carbondata/pull/3649#issuecomment-595751540
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2357/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kevinjmh commented on a change in pull request #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage

GitBox
In reply to this post by GitBox
kevinjmh commented on a change in pull request #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage
URL: https://github.com/apache/carbondata/pull/3649#discussion_r389219846
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForNoDictionary.java
 ##########
 @@ -18,18 +18,14 @@
 package org.apache.carbondata.core.datastore.columnar;
 
 import java.util.Arrays;
-import java.util.Map;
 
 import org.apache.carbondata.core.metadata.datatype.DataType;
 
+/**
+ * Only try applying RLE on rowId
+ */
 public class BlockIndexerStorageForNoDictionary extends BlockIndexerStorage<Object[]> {
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kevinjmh commented on a change in pull request #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage

GitBox
In reply to this post by GitBox
kevinjmh commented on a change in pull request #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage
URL: https://github.com/apache/carbondata/pull/3649#discussion_r389219855
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java
 ##########
 @@ -17,38 +17,25 @@
 
 package org.apache.carbondata.core.datastore.columnar;
 
-import java.util.ArrayList;
 import java.util.Arrays;
-import java.util.List;
-import java.util.Map;
-
-import org.apache.carbondata.core.constants.CarbonCommonConstants;
-import org.apache.carbondata.core.util.ByteUtil;
 
+/**
+ * Only try applying RLE on rowId and data
+ */
 public class BlockIndexerStorageForShort extends BlockIndexerStorage<byte[][]> {
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kevinjmh commented on a change in pull request #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage

GitBox
In reply to this post by GitBox
kevinjmh commented on a change in pull request #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage
URL: https://github.com/apache/carbondata/pull/3649#discussion_r389219864
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForNoInvertedIndexForShort.java
 ##########
 @@ -17,85 +17,17 @@
 
 package org.apache.carbondata.core.datastore.columnar;
 
-import java.util.ArrayList;
-import java.util.List;
-
-import org.apache.carbondata.core.constants.CarbonCommonConstants;
-import org.apache.carbondata.core.util.ByteUtil;
-
 /**
  * Below class will be used to for no inverted index
+ * Only try applying RLE on data
  */
 public class BlockIndexerStorageForNoInvertedIndexForShort extends BlockIndexerStorage<byte[][]> {
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kevinjmh commented on a change in pull request #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage

GitBox
In reply to this post by GitBox
kevinjmh commented on a change in pull request #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage
URL: https://github.com/apache/carbondata/pull/3649#discussion_r389219878
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorage.java
 ##########
 @@ -50,7 +83,7 @@
    *
    * @param rowIds
    */
-  protected Map<String, short[]> rleEncodeOnRowId(short[] rowIds) {
+  protected void rleEncodeOnRowId(short[] rowIds) {
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage

GitBox
In reply to this post by GitBox
asfgit closed pull request #3649: [CARBONDATA-3730] Avoid data conversion and remove duplicate codes in BlockIndexerStorage
URL: https://github.com/apache/carbondata/pull/3649
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services