CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583844201 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1895/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583860628 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/200/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583867782 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1902/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583925032 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/203/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
QiangCai commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583934608 better to do further testing on a cluster, not only a local machine. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583935130 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1905/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583959274 @QiangCai ok, I will verify on cluster ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk edited a comment on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583959274 @QiangCai ok, I will verify on cluster, and with old table ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk edited a comment on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-583959274 @QiangCai ok, I will verify on cluster, and with old table created before this PR ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#discussion_r376948840 ########## File path: core/src/main/java/org/apache/carbondata/core/util/path/CarbonTablePath.java ########## @@ -285,17 +286,38 @@ public static String getSegmentPath(String tablePath, String segmentId) { } /** - * Gets data file name only with out path - * - * @param filePartNo data file part number - * @param taskNo task identifier - * @param factUpdateTimeStamp unique identifier to identify an update - * @return gets data file name only with out path + * Gets data file name only, without parent path */ public static String getCarbonDataFileName(Integer filePartNo, String taskNo, int bucketNumber, - int batchNo, String factUpdateTimeStamp, String segmentNo) { - return DATA_PART_PREFIX + filePartNo + "-" + taskNo + BATCH_PREFIX + batchNo + "-" - + bucketNumber + "-" + segmentNo + "-" + factUpdateTimeStamp + CARBON_DATA_EXT; + int batchNo, String factUpdateTimeStamp, String segmentNo, String compressor) { + Objects.requireNonNull(filePartNo); + Objects.requireNonNull(taskNo); + Objects.requireNonNull(factUpdateTimeStamp); + Objects.requireNonNull(compressor); + + // Start from CarbonData 2.0, the data file name patten is: + // partNo-taskNo-batchNo-bucketNo-segmentNo-timestamp.compressor.carbondata + // For example: + // part-0-0_batchno0-0-0-1580982686749.zstd.carbondata + // + // If the compressor name is missing, the file is compressed by snappy, which is + // the default compressor in CarbonData 1.x + + return new StringBuilder() + .append(DATA_PART_PREFIX) + .append(filePartNo) + .append("-") + .append(taskNo) + .append(BATCH_PREFIX) + .append(batchNo) + .append("-") + .append(bucketNumber) + .append("-") + .append(segmentNo) + .append("-") + .append(factUpdateTimeStamp) Review comment: supposed to append compressor name here ? I see that argument `compressor` is not used. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#discussion_r376988998 ########## File path: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datacompaction/TableLevelCompactionOptionTest.scala ########## @@ -139,6 +139,8 @@ class TableLevelCompactionOptionTest extends QueryTest sql(s"LOAD DATA LOCAL INPATH '$tempFilePath' INTO TABLE carbon_table") } + sql("SHOW SEGMENTS FOR TABLE carbon_table").show Review comment: revert this or add validation ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#discussion_r376989476 ########## File path: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala ########## @@ -141,11 +141,6 @@ private[sql] case class CarbonDescribeFormattedCommand( Strings.formatSize( tblProps.getOrElse(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB, CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB_DEFAULT).toFloat), ""), - ("Data File Compressor ", tblProps Review comment: why removed this ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#discussion_r376990515 ########## File path: streaming/src/main/java/org/apache/carbondata/streaming/CarbonStreamRecordWriter.java ########## @@ -188,7 +190,7 @@ private void initializeAtFirstRow() throws IOException { compressorName = carbonTable.getTableInfo().getFactTable().getTableProperties().get( CarbonCommonConstants.COMPRESSOR); if (null == compressorName) { - compressorName = CompressorFactory.getInstance().getCompressor().getName(); + compressorName = CompressorFactory.NativeSupportedCompressor.SNAPPY.getName(); Review comment: why always snappy for streaming segment ? before I think it was supporting configured compressor ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#discussion_r376990677 ########## File path: streaming/src/main/java/org/apache/carbondata/streaming/CarbonStreamRecordWriter.java ########## @@ -140,7 +140,9 @@ private void initialize(TaskAttemptContext job) throws IOException { segmentDir = CarbonTablePath.getSegmentPath( carbonTable.getAbsoluteTableIdentifier().getTablePath(), segmentId); - fileName = CarbonTablePath.getCarbonDataFileName(0, taskNo + "", 0, 0, "0", segmentId); + fileName = CarbonTablePath.getCarbonDataFileName( Review comment: why always snappy for streaming segment ? before I think it was supporting configured compressor ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#discussion_r377100880 ########## File path: core/src/main/java/org/apache/carbondata/core/util/path/CarbonTablePath.java ########## @@ -285,17 +286,38 @@ public static String getSegmentPath(String tablePath, String segmentId) { } /** - * Gets data file name only with out path - * - * @param filePartNo data file part number - * @param taskNo task identifier - * @param factUpdateTimeStamp unique identifier to identify an update - * @return gets data file name only with out path + * Gets data file name only, without parent path */ public static String getCarbonDataFileName(Integer filePartNo, String taskNo, int bucketNumber, - int batchNo, String factUpdateTimeStamp, String segmentNo) { - return DATA_PART_PREFIX + filePartNo + "-" + taskNo + BATCH_PREFIX + batchNo + "-" - + bucketNumber + "-" + segmentNo + "-" + factUpdateTimeStamp + CARBON_DATA_EXT; + int batchNo, String factUpdateTimeStamp, String segmentNo, String compressor) { + Objects.requireNonNull(filePartNo); + Objects.requireNonNull(taskNo); + Objects.requireNonNull(factUpdateTimeStamp); + Objects.requireNonNull(compressor); + + // Start from CarbonData 2.0, the data file name patten is: + // partNo-taskNo-batchNo-bucketNo-segmentNo-timestamp.compressor.carbondata + // For example: + // part-0-0_batchno0-0-0-1580982686749.zstd.carbondata + // + // If the compressor name is missing, the file is compressed by snappy, which is + // the default compressor in CarbonData 1.x + + return new StringBuilder() + .append(DATA_PART_PREFIX) + .append(filePartNo) + .append("-") + .append(taskNo) + .append(BATCH_PREFIX) + .append(batchNo) + .append("-") + .append(bucketNumber) + .append("-") + .append(segmentNo) + .append("-") + .append(factUpdateTimeStamp) Review comment: yes, I wanted to test once without changing the file name, in order to simulate using new jar to test on old files. As the CI is passed, so that the compatibility is ok. I will add the compressor name in file name now ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#discussion_r377101401 ########## File path: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datacompaction/TableLevelCompactionOptionTest.scala ########## @@ -139,6 +139,8 @@ class TableLevelCompactionOptionTest extends QueryTest sql(s"LOAD DATA LOCAL INPATH '$tempFilePath' INTO TABLE carbon_table") } + sql("SHOW SEGMENTS FOR TABLE carbon_table").show Review comment: fixed ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#discussion_r377102667 ########## File path: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala ########## @@ -141,11 +141,6 @@ private[sql] case class CarbonDescribeFormattedCommand( Strings.formatSize( tblProps.getOrElse(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB, CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB_DEFAULT).toFloat), ""), - ("Data File Compressor ", tblProps Review comment: Because the default compressor changed, so this print is not correct now. And there is no way to know the compressor unless you check the file name. I think this is better since very loading can use different compressor, so we should not show in DESC which is table level ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#discussion_r377103094 ########## File path: streaming/src/main/java/org/apache/carbondata/streaming/CarbonStreamRecordWriter.java ########## @@ -188,7 +190,7 @@ private void initializeAtFirstRow() throws IOException { compressorName = carbonTable.getTableInfo().getFactTable().getTableProperties().get( CarbonCommonConstants.COMPRESSOR); if (null == compressorName) { - compressorName = CompressorFactory.getInstance().getCompressor().getName(); + compressorName = CompressorFactory.NativeSupportedCompressor.SNAPPY.getName(); Review comment: There is testcase failing if set to ZSTD. Need another PR for it ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#discussion_r377103325 ########## File path: streaming/src/main/java/org/apache/carbondata/streaming/CarbonStreamRecordWriter.java ########## @@ -140,7 +140,9 @@ private void initialize(TaskAttemptContext job) throws IOException { segmentDir = CarbonTablePath.getSegmentPath( carbonTable.getAbsoluteTableIdentifier().getTablePath(), segmentId); - fileName = CarbonTablePath.getCarbonDataFileName(0, taskNo + "", 0, 0, "0", segmentId); + fileName = CarbonTablePath.getCarbonDataFileName( Review comment: There is testcase failing if set to ZSTD. Need another PR for it ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3606: [CARBONDATA-3681] Change default compressor to zstd
URL: https://github.com/apache/carbondata/pull/3606#issuecomment-584166330 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/216/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
Free forum by Nabble | Edit this page |