[GitHub] [carbondata] jackylk opened a new pull request #3606: [WIP] add compressor to file name and change default compressor to zstd

classic Classic list List threaded Threaded
66 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk opened a new pull request #3606: [WIP] add compressor to file name and change default compressor to zstd

GitBox
jackylk opened a new pull request #3606: [WIP] add compressor to file name and change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606
 
 
    ### Why is this PR needed?
    WIP
   
    ### What changes were proposed in this PR?
   
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3606: [WIP] add compressor to file name and change default compressor to zstd

GitBox
CarbonDataQA1 commented on issue #3606: [WIP] add compressor to file name and change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-582822047
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/165/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3606: [WIP] add compressor to file name and change default compressor to zstd

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [WIP] add compressor to file name and change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-582837564
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1868/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3606: [WIP] add compressor to file name and change default compressor to zstd

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [WIP] add compressor to file name and change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-582910031
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/171/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3606: [WIP] add compressor to file name and change default compressor to zstd

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [WIP] add compressor to file name and change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-582933159
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1873/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3606: [WIP] add compressor to file name and change default compressor to zstd

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [WIP] add compressor to file name and change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583002412
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/172/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3606: [WIP] add compressor to file name and change default compressor to zstd

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [WIP] add compressor to file name and change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583027918
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1874/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583289470
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/176/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583311877
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1879/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583337934
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/177/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583357088
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1880/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583758880
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/184/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583764317
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1887/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] niuge01 commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd

GitBox
In reply to this post by GitBox
niuge01 commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#discussion_r376770522
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/util/path/CarbonTablePath.java
 ##########
 @@ -285,17 +286,39 @@ public static String getSegmentPath(String tablePath, String segmentId) {
   }
 
   /**
-   * Gets data file name only with out path
-   *
-   * @param filePartNo          data file part number
-   * @param taskNo              task identifier
-   * @param factUpdateTimeStamp unique identifier to identify an update
-   * @return gets data file name only with out path
+   * Gets data file name only, without parent path
    */
   public static String getCarbonDataFileName(Integer filePartNo, String taskNo, int bucketNumber,
-      int batchNo, String factUpdateTimeStamp, String segmentNo) {
-    return DATA_PART_PREFIX + filePartNo + "-" + taskNo + BATCH_PREFIX + batchNo + "-"
-        + bucketNumber + "-" + segmentNo + "-" + factUpdateTimeStamp + CARBON_DATA_EXT;
+      int batchNo, String factUpdateTimeStamp, String segmentNo, String compressor) {
+    Objects.requireNonNull(filePartNo);
+    Objects.requireNonNull(taskNo);
+    Objects.requireNonNull(factUpdateTimeStamp);
+    Objects.requireNonNull(compressor);
+
+    // Start from CarbonData 2.0, the data file name patten is:
+    // partNo-taskNo-batchNo-bucketNo-segmentNo-timestamp.compressor.carbondata
+    // For example:
+    // part-0-0_batchno0-0-0-1580982686749.zstd.carbondata
+    //
+    // If the compressor name is missing, the file is compressed by snappy, which is
+    // the default compressor in CarbonData 1.x
+
+    return new StringBuffer().append(DATA_PART_PREFIX)
 
 Review comment:
   There is need not use StringBuffer to build string, just use string concat will be ok.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] niuge01 commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd

GitBox
In reply to this post by GitBox
niuge01 commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#discussion_r376770113
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/readcommitter/LatestFilesReadCommittedScope.java
 ##########
 @@ -163,7 +163,7 @@ public SegmentRefreshInfo getCommittedSegmentRefreshInfo(Segment segment, Update
     return segmentRefreshInfo;
   }
 
-  private String getSegmentID(String carbonIndexFileName, String indexFilePath) {
+  private String getTimestamp(String carbonIndexFileName, String indexFilePath) {
 
 Review comment:
   Why change method name to getTimestamp?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] niuge01 commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd

GitBox
In reply to this post by GitBox
niuge01 commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#discussion_r376769368
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ##########
 @@ -1083,7 +1083,7 @@ private CarbonCommonConstants() {
    * The optional values are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD' and empty.
    * Specially, empty means that Carbondata will not compress the sort temp files.
    */
-  public static final String CARBON_SORT_TEMP_COMPRESSOR_DEFAULT = "SNAPPY";
+  public static final String CARBON_SORT_TEMP_COMPRESSOR_DEFAULT = "zstd";
 
 Review comment:
   ```suggestion
     public static final String CARBON_SORT_TEMP_COMPRESSOR_DEFAULT = "ZSTD";
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#discussion_r376776809
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ##########
 @@ -1083,7 +1083,7 @@ private CarbonCommonConstants() {
    * The optional values are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD' and empty.
    * Specially, empty means that Carbondata will not compress the sort temp files.
    */
-  public static final String CARBON_SORT_TEMP_COMPRESSOR_DEFAULT = "SNAPPY";
+  public static final String CARBON_SORT_TEMP_COMPRESSOR_DEFAULT = "zstd";
 
 Review comment:
   fixed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#discussion_r376776975
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/readcommitter/LatestFilesReadCommittedScope.java
 ##########
 @@ -163,7 +163,7 @@ public SegmentRefreshInfo getCommittedSegmentRefreshInfo(Segment segment, Update
     return segmentRefreshInfo;
   }
 
-  private String getSegmentID(String carbonIndexFileName, String indexFilePath) {
+  private String getTimestamp(String carbonIndexFileName, String indexFilePath) {
 
 Review comment:
   I changed back

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#discussion_r376777589
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/util/path/CarbonTablePath.java
 ##########
 @@ -285,17 +286,39 @@ public static String getSegmentPath(String tablePath, String segmentId) {
   }
 
   /**
-   * Gets data file name only with out path
-   *
-   * @param filePartNo          data file part number
-   * @param taskNo              task identifier
-   * @param factUpdateTimeStamp unique identifier to identify an update
-   * @return gets data file name only with out path
+   * Gets data file name only, without parent path
    */
   public static String getCarbonDataFileName(Integer filePartNo, String taskNo, int bucketNumber,
-      int batchNo, String factUpdateTimeStamp, String segmentNo) {
-    return DATA_PART_PREFIX + filePartNo + "-" + taskNo + BATCH_PREFIX + batchNo + "-"
-        + bucketNumber + "-" + segmentNo + "-" + factUpdateTimeStamp + CARBON_DATA_EXT;
+      int batchNo, String factUpdateTimeStamp, String segmentNo, String compressor) {
+    Objects.requireNonNull(filePartNo);
+    Objects.requireNonNull(taskNo);
+    Objects.requireNonNull(factUpdateTimeStamp);
+    Objects.requireNonNull(compressor);
+
+    // Start from CarbonData 2.0, the data file name patten is:
+    // partNo-taskNo-batchNo-bucketNo-segmentNo-timestamp.compressor.carbondata
+    // For example:
+    // part-0-0_batchno0-0-0-1580982686749.zstd.carbondata
+    //
+    // If the compressor name is missing, the file is compressed by snappy, which is
+    // the default compressor in CarbonData 1.x
+
+    return new StringBuffer().append(DATA_PART_PREFIX)
 
 Review comment:
   I changed to StringBuilder, and this link (https://stackoverflow.com/questions/47605/string-concatenation-concat-vs-operator) suggest StringBuilder is more efficient

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583838300
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/193/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
1234