jackylk opened a new pull request #3606: [WIP] add compressor to file name and change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606 ### Why is this PR needed? WIP ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
CarbonDataQA1 commented on issue #3606: [WIP] add compressor to file name and change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-582822047 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/165/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [WIP] add compressor to file name and change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-582837564 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1868/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [WIP] add compressor to file name and change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-582910031 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/171/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [WIP] add compressor to file name and change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-582933159 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1873/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [WIP] add compressor to file name and change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583002412 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/172/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [WIP] add compressor to file name and change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583027918 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1874/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583289470 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/176/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583311877 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1879/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583337934 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/177/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583357088 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1880/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583758880 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/184/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583764317 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1887/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
niuge01 commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#discussion_r376770522 ########## File path: core/src/main/java/org/apache/carbondata/core/util/path/CarbonTablePath.java ########## @@ -285,17 +286,39 @@ public static String getSegmentPath(String tablePath, String segmentId) { } /** - * Gets data file name only with out path - * - * @param filePartNo data file part number - * @param taskNo task identifier - * @param factUpdateTimeStamp unique identifier to identify an update - * @return gets data file name only with out path + * Gets data file name only, without parent path */ public static String getCarbonDataFileName(Integer filePartNo, String taskNo, int bucketNumber, - int batchNo, String factUpdateTimeStamp, String segmentNo) { - return DATA_PART_PREFIX + filePartNo + "-" + taskNo + BATCH_PREFIX + batchNo + "-" - + bucketNumber + "-" + segmentNo + "-" + factUpdateTimeStamp + CARBON_DATA_EXT; + int batchNo, String factUpdateTimeStamp, String segmentNo, String compressor) { + Objects.requireNonNull(filePartNo); + Objects.requireNonNull(taskNo); + Objects.requireNonNull(factUpdateTimeStamp); + Objects.requireNonNull(compressor); + + // Start from CarbonData 2.0, the data file name patten is: + // partNo-taskNo-batchNo-bucketNo-segmentNo-timestamp.compressor.carbondata + // For example: + // part-0-0_batchno0-0-0-1580982686749.zstd.carbondata + // + // If the compressor name is missing, the file is compressed by snappy, which is + // the default compressor in CarbonData 1.x + + return new StringBuffer().append(DATA_PART_PREFIX) Review comment: There is need not use StringBuffer to build string, just use string concat will be ok. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
niuge01 commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#discussion_r376770113 ########## File path: core/src/main/java/org/apache/carbondata/core/readcommitter/LatestFilesReadCommittedScope.java ########## @@ -163,7 +163,7 @@ public SegmentRefreshInfo getCommittedSegmentRefreshInfo(Segment segment, Update return segmentRefreshInfo; } - private String getSegmentID(String carbonIndexFileName, String indexFilePath) { + private String getTimestamp(String carbonIndexFileName, String indexFilePath) { Review comment: Why change method name to getTimestamp? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
niuge01 commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#discussion_r376769368 ########## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ########## @@ -1083,7 +1083,7 @@ private CarbonCommonConstants() { * The optional values are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD' and empty. * Specially, empty means that Carbondata will not compress the sort temp files. */ - public static final String CARBON_SORT_TEMP_COMPRESSOR_DEFAULT = "SNAPPY"; + public static final String CARBON_SORT_TEMP_COMPRESSOR_DEFAULT = "zstd"; Review comment: ```suggestion public static final String CARBON_SORT_TEMP_COMPRESSOR_DEFAULT = "ZSTD"; ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#discussion_r376776809 ########## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ########## @@ -1083,7 +1083,7 @@ private CarbonCommonConstants() { * The optional values are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD' and empty. * Specially, empty means that Carbondata will not compress the sort temp files. */ - public static final String CARBON_SORT_TEMP_COMPRESSOR_DEFAULT = "SNAPPY"; + public static final String CARBON_SORT_TEMP_COMPRESSOR_DEFAULT = "zstd"; Review comment: fixed ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#discussion_r376776975 ########## File path: core/src/main/java/org/apache/carbondata/core/readcommitter/LatestFilesReadCommittedScope.java ########## @@ -163,7 +163,7 @@ public SegmentRefreshInfo getCommittedSegmentRefreshInfo(Segment segment, Update return segmentRefreshInfo; } - private String getSegmentID(String carbonIndexFileName, String indexFilePath) { + private String getTimestamp(String carbonIndexFileName, String indexFilePath) { Review comment: I changed back ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#discussion_r376777589 ########## File path: core/src/main/java/org/apache/carbondata/core/util/path/CarbonTablePath.java ########## @@ -285,17 +286,39 @@ public static String getSegmentPath(String tablePath, String segmentId) { } /** - * Gets data file name only with out path - * - * @param filePartNo data file part number - * @param taskNo task identifier - * @param factUpdateTimeStamp unique identifier to identify an update - * @return gets data file name only with out path + * Gets data file name only, without parent path */ public static String getCarbonDataFileName(Integer filePartNo, String taskNo, int bucketNumber, - int batchNo, String factUpdateTimeStamp, String segmentNo) { - return DATA_PART_PREFIX + filePartNo + "-" + taskNo + BATCH_PREFIX + batchNo + "-" - + bucketNumber + "-" + segmentNo + "-" + factUpdateTimeStamp + CARBON_DATA_EXT; + int batchNo, String factUpdateTimeStamp, String segmentNo, String compressor) { + Objects.requireNonNull(filePartNo); + Objects.requireNonNull(taskNo); + Objects.requireNonNull(factUpdateTimeStamp); + Objects.requireNonNull(compressor); + + // Start from CarbonData 2.0, the data file name patten is: + // partNo-taskNo-batchNo-bucketNo-segmentNo-timestamp.compressor.carbondata + // For example: + // part-0-0_batchno0-0-0-1580982686749.zstd.carbondata + // + // If the compressor name is missing, the file is compressed by snappy, which is + // the default compressor in CarbonData 1.x + + return new StringBuffer().append(DATA_PART_PREFIX) Review comment: I changed to StringBuilder, and this link (https://stackoverflow.com/questions/47605/string-concatenation-concat-vs-operator) suggest StringBuilder is more efficient ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583838300 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/193/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
Free forum by Nabble | Edit this page |