CarbonDataQA2 commented on pull request #4104: URL: https://github.com/apache/carbondata/pull/4104#issuecomment-797430674 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3797/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
liuhe0702 commented on pull request #4104: URL: https://github.com/apache/carbondata/pull/4104#issuecomment-797432137 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4104: URL: https://github.com/apache/carbondata/pull/4104#issuecomment-797496773 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5565/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4104: URL: https://github.com/apache/carbondata/pull/4104#issuecomment-797499421 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3799/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on pull request #4104: URL: https://github.com/apache/carbondata/pull/4104#issuecomment-800028090 @liuhe0702 i think this issue is already being handled in #3988 , so i think no need of this PR, please refer the jira to the jira of #3988 and close as duplicate. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 edited a comment on pull request #4104: URL: https://github.com/apache/carbondata/pull/4104#issuecomment-800028090 @liuhe0702 i think this issue is already being handled in #3988 , so i think no need of this PR, please refer the jira to the jira of #3988 and close as duplicate. @ShreelekhyaG please confirm the same here, where this scenario is being handled or not in your PR. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ShreelekhyaG commented on pull request #4104: URL: https://github.com/apache/carbondata/pull/4104#issuecomment-800030287 Yes @akashrn5/ @liuhe0702 , the same scenario being handled in PR #3988 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on pull request #4104: URL: https://github.com/apache/carbondata/pull/4104#issuecomment-800031297 @liuhe0702 please make the jira as duplicate and comment there as its handled in #3988 and close both jira and PR, thanks ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
liuhe0702 closed pull request #4104: URL: https://github.com/apache/carbondata/pull/4104 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ShreelekhyaG commented on a change in pull request #4104: URL: https://github.com/apache/carbondata/pull/4104#discussion_r596907341 ########## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala ########## @@ -296,26 +299,50 @@ class CarbonTableCompactor( if (finalMergeStatus) { val mergedLoadNumber = CarbonDataMergerUtil.getLoadNumberFromLoadName(mergedLoadName) - var segmentFilesForIUDCompact = new util.ArrayList[Segment]() var segmentFileName: String = null + + val mergeIndex = CarbonProperties.getInstance().getProperty( + CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT, + CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT_DEFAULT).toBoolean + + if (compactionType != CompactionType.IUD_DELETE_DELTA && mergeIndex) { + MergeIndexUtil.mergeIndexFilesOnCompaction(compactionCallableModel) + } + if (carbonTable.isHivePartitionTable) { - val readPath = - CarbonTablePath.getSegmentFilesLocation(carbonLoadModel.getTablePath) + - CarbonCommonConstants.FILE_SEPARATOR + carbonLoadModel.getFactTimeStamp + ".tmp" - // Merge all partition files into a single file. - segmentFileName = - mergedLoadNumber + "_" + carbonLoadModel.getFactTimeStamp - val segmentFile = SegmentFileStore - .mergeSegmentFiles(readPath, - segmentFileName, - CarbonTablePath.getSegmentFilesLocation(carbonLoadModel.getTablePath)) - if (segmentFile != null) { - SegmentFileStore - .moveFromTempFolder(segmentFile, - carbonLoadModel.getFactTimeStamp + ".tmp", - carbonLoadModel.getTablePath) + if (mergeIndex) { + val segmentTmpFileName = carbonLoadModel.getFactTimeStamp + CarbonTablePath.SEGMENT_EXT + segmentFileName = mergedLoadNumber + "_" + segmentTmpFileName + val segmentTmpFile = FileFactory.getCarbonFile( + CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath, segmentTmpFileName)) + if (!segmentTmpFile.renameForce( + CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath, segmentFileName))) { + throw new Exception(s"Rename segment file from ${segmentTmpFileName} " + + s"to ${segmentFileName} failed.") + } + val tmpPath = Review comment: If we are not using tmp segment files when mergeIndex is true, may be we can avoid writing it. pls check once in `RowResultMergerProcessor `and `CompactionResultSortProcessor`. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4104: URL: https://github.com/apache/carbondata/pull/4104#issuecomment-802047900 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5585/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4104: URL: https://github.com/apache/carbondata/pull/4104#issuecomment-802052102 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3820/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
liuhe0702 commented on a change in pull request #4104: URL: https://github.com/apache/carbondata/pull/4104#discussion_r597028077 ########## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala ########## @@ -296,26 +299,50 @@ class CarbonTableCompactor( if (finalMergeStatus) { val mergedLoadNumber = CarbonDataMergerUtil.getLoadNumberFromLoadName(mergedLoadName) - var segmentFilesForIUDCompact = new util.ArrayList[Segment]() var segmentFileName: String = null + + val mergeIndex = CarbonProperties.getInstance().getProperty( + CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT, + CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT_DEFAULT).toBoolean + + if (compactionType != CompactionType.IUD_DELETE_DELTA && mergeIndex) { + MergeIndexUtil.mergeIndexFilesOnCompaction(compactionCallableModel) + } + if (carbonTable.isHivePartitionTable) { - val readPath = - CarbonTablePath.getSegmentFilesLocation(carbonLoadModel.getTablePath) + - CarbonCommonConstants.FILE_SEPARATOR + carbonLoadModel.getFactTimeStamp + ".tmp" - // Merge all partition files into a single file. - segmentFileName = - mergedLoadNumber + "_" + carbonLoadModel.getFactTimeStamp - val segmentFile = SegmentFileStore - .mergeSegmentFiles(readPath, - segmentFileName, - CarbonTablePath.getSegmentFilesLocation(carbonLoadModel.getTablePath)) - if (segmentFile != null) { - SegmentFileStore - .moveFromTempFolder(segmentFile, - carbonLoadModel.getFactTimeStamp + ".tmp", - carbonLoadModel.getTablePath) + if (mergeIndex) { + val segmentTmpFileName = carbonLoadModel.getFactTimeStamp + CarbonTablePath.SEGMENT_EXT + segmentFileName = mergedLoadNumber + "_" + segmentTmpFileName + val segmentTmpFile = FileFactory.getCarbonFile( + CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath, segmentTmpFileName)) + if (!segmentTmpFile.renameForce( + CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath, segmentFileName))) { + throw new Exception(s"Rename segment file from ${segmentTmpFileName} " + + s"to ${segmentFileName} failed.") + } + val tmpPath = Review comment: tmpPath is used. During merging index, each task for partition will generated a temporaty segment file in tmpPath and all the temporaty segment files are merged into a final segment file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
liuhe0702 commented on a change in pull request #4104: URL: https://github.com/apache/carbondata/pull/4104#discussion_r597028077 ########## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala ########## @@ -296,26 +299,50 @@ class CarbonTableCompactor( if (finalMergeStatus) { val mergedLoadNumber = CarbonDataMergerUtil.getLoadNumberFromLoadName(mergedLoadName) - var segmentFilesForIUDCompact = new util.ArrayList[Segment]() var segmentFileName: String = null + + val mergeIndex = CarbonProperties.getInstance().getProperty( + CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT, + CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT_DEFAULT).toBoolean + + if (compactionType != CompactionType.IUD_DELETE_DELTA && mergeIndex) { + MergeIndexUtil.mergeIndexFilesOnCompaction(compactionCallableModel) + } + if (carbonTable.isHivePartitionTable) { - val readPath = - CarbonTablePath.getSegmentFilesLocation(carbonLoadModel.getTablePath) + - CarbonCommonConstants.FILE_SEPARATOR + carbonLoadModel.getFactTimeStamp + ".tmp" - // Merge all partition files into a single file. - segmentFileName = - mergedLoadNumber + "_" + carbonLoadModel.getFactTimeStamp - val segmentFile = SegmentFileStore - .mergeSegmentFiles(readPath, - segmentFileName, - CarbonTablePath.getSegmentFilesLocation(carbonLoadModel.getTablePath)) - if (segmentFile != null) { - SegmentFileStore - .moveFromTempFolder(segmentFile, - carbonLoadModel.getFactTimeStamp + ".tmp", - carbonLoadModel.getTablePath) + if (mergeIndex) { + val segmentTmpFileName = carbonLoadModel.getFactTimeStamp + CarbonTablePath.SEGMENT_EXT + segmentFileName = mergedLoadNumber + "_" + segmentTmpFileName + val segmentTmpFile = FileFactory.getCarbonFile( + CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath, segmentTmpFileName)) + if (!segmentTmpFile.renameForce( + CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath, segmentFileName))) { + throw new Exception(s"Rename segment file from ${segmentTmpFileName} " + + s"to ${segmentFileName} failed.") + } + val tmpPath = Review comment: tmpPath is used. During merging index, each task for partition will generated a temporaty segment file in tmpPath and all the temporaty segment files are merged into a final segment file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
liuhe0702 commented on a change in pull request #4104: URL: https://github.com/apache/carbondata/pull/4104#discussion_r597382404 ########## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala ########## @@ -296,26 +299,50 @@ class CarbonTableCompactor( if (finalMergeStatus) { val mergedLoadNumber = CarbonDataMergerUtil.getLoadNumberFromLoadName(mergedLoadName) - var segmentFilesForIUDCompact = new util.ArrayList[Segment]() var segmentFileName: String = null + + val mergeIndex = CarbonProperties.getInstance().getProperty( + CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT, + CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT_DEFAULT).toBoolean + + if (compactionType != CompactionType.IUD_DELETE_DELTA && mergeIndex) { + MergeIndexUtil.mergeIndexFilesOnCompaction(compactionCallableModel) + } + if (carbonTable.isHivePartitionTable) { - val readPath = - CarbonTablePath.getSegmentFilesLocation(carbonLoadModel.getTablePath) + - CarbonCommonConstants.FILE_SEPARATOR + carbonLoadModel.getFactTimeStamp + ".tmp" - // Merge all partition files into a single file. - segmentFileName = - mergedLoadNumber + "_" + carbonLoadModel.getFactTimeStamp - val segmentFile = SegmentFileStore - .mergeSegmentFiles(readPath, - segmentFileName, - CarbonTablePath.getSegmentFilesLocation(carbonLoadModel.getTablePath)) - if (segmentFile != null) { - SegmentFileStore - .moveFromTempFolder(segmentFile, - carbonLoadModel.getFactTimeStamp + ".tmp", - carbonLoadModel.getTablePath) + if (mergeIndex) { + val segmentTmpFileName = carbonLoadModel.getFactTimeStamp + CarbonTablePath.SEGMENT_EXT + segmentFileName = mergedLoadNumber + "_" + segmentTmpFileName + val segmentTmpFile = FileFactory.getCarbonFile( + CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath, segmentTmpFileName)) + if (!segmentTmpFile.renameForce( + CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath, segmentFileName))) { + throw new Exception(s"Rename segment file from ${segmentTmpFileName} " + + s"to ${segmentFileName} failed.") + } + val tmpPath = Review comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #4104: URL: https://github.com/apache/carbondata/pull/4104#discussion_r597407885 ########## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala ########## @@ -296,26 +299,46 @@ class CarbonTableCompactor( if (finalMergeStatus) { val mergedLoadNumber = CarbonDataMergerUtil.getLoadNumberFromLoadName(mergedLoadName) - var segmentFilesForIUDCompact = new util.ArrayList[Segment]() var segmentFileName: String = null + + val isMergeIndexEnabled = CarbonProperties.getInstance().getProperty( + CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT, + CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT_DEFAULT).toBoolean + + if (compactionType != CompactionType.IUD_DELETE_DELTA && isMergeIndexEnabled) { + MergeIndexUtil.mergeIndexFilesOnCompaction(compactionCallableModel) + } + if (carbonTable.isHivePartitionTable) { - val readPath = - CarbonTablePath.getSegmentFilesLocation(carbonLoadModel.getTablePath) + - CarbonCommonConstants.FILE_SEPARATOR + carbonLoadModel.getFactTimeStamp + ".tmp" - // Merge all partition files into a single file. - segmentFileName = - mergedLoadNumber + "_" + carbonLoadModel.getFactTimeStamp - val segmentFile = SegmentFileStore - .mergeSegmentFiles(readPath, - segmentFileName, - CarbonTablePath.getSegmentFilesLocation(carbonLoadModel.getTablePath)) - if (segmentFile != null) { - SegmentFileStore - .moveFromTempFolder(segmentFile, - carbonLoadModel.getFactTimeStamp + ".tmp", - carbonLoadModel.getTablePath) + if (isMergeIndexEnabled) { + val segmentTmpFileName = carbonLoadModel.getFactTimeStamp + CarbonTablePath.SEGMENT_EXT + segmentFileName = mergedLoadNumber + "_" + segmentTmpFileName + val segmentTmpFile = FileFactory.getCarbonFile( + CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath, segmentTmpFileName)) + if (!segmentTmpFile.renameForce( + CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath, segmentFileName))) { + throw new Exception(s"Rename segment file from ${segmentTmpFileName} " + + s"to ${segmentFileName} failed.") + } + } else { + val readPath = + CarbonTablePath.getSegmentFilesLocation(carbonLoadModel.getTablePath) + + CarbonCommonConstants.FILE_SEPARATOR + carbonLoadModel.getFactTimeStamp + ".tmp" + // Merge all partition files into a single file. + segmentFileName = + mergedLoadNumber + "_" + carbonLoadModel.getFactTimeStamp Review comment: use static constant from CarbonCommonConstants for "_", check and replace in other places also -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #4104: URL: https://github.com/apache/carbondata/pull/4104#discussion_r597409105 ########## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala ########## @@ -296,26 +299,46 @@ class CarbonTableCompactor( if (finalMergeStatus) { val mergedLoadNumber = CarbonDataMergerUtil.getLoadNumberFromLoadName(mergedLoadName) - var segmentFilesForIUDCompact = new util.ArrayList[Segment]() var segmentFileName: String = null + + val isMergeIndexEnabled = CarbonProperties.getInstance().getProperty( + CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT, + CarbonCommonConstants.CARBON_MERGE_INDEX_IN_SEGMENT_DEFAULT).toBoolean + + if (compactionType != CompactionType.IUD_DELETE_DELTA && isMergeIndexEnabled) { + MergeIndexUtil.mergeIndexFilesOnCompaction(compactionCallableModel) + } + if (carbonTable.isHivePartitionTable) { - val readPath = - CarbonTablePath.getSegmentFilesLocation(carbonLoadModel.getTablePath) + - CarbonCommonConstants.FILE_SEPARATOR + carbonLoadModel.getFactTimeStamp + ".tmp" - // Merge all partition files into a single file. - segmentFileName = - mergedLoadNumber + "_" + carbonLoadModel.getFactTimeStamp - val segmentFile = SegmentFileStore - .mergeSegmentFiles(readPath, - segmentFileName, - CarbonTablePath.getSegmentFilesLocation(carbonLoadModel.getTablePath)) - if (segmentFile != null) { - SegmentFileStore - .moveFromTempFolder(segmentFile, - carbonLoadModel.getFactTimeStamp + ".tmp", - carbonLoadModel.getTablePath) + if (isMergeIndexEnabled) { + val segmentTmpFileName = carbonLoadModel.getFactTimeStamp + CarbonTablePath.SEGMENT_EXT + segmentFileName = mergedLoadNumber + "_" + segmentTmpFileName + val segmentTmpFile = FileFactory.getCarbonFile( + CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath, segmentTmpFileName)) + if (!segmentTmpFile.renameForce( + CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath, segmentFileName))) { + throw new Exception(s"Rename segment file from ${segmentTmpFileName} " + + s"to ${segmentFileName} failed.") + } + } else { + val readPath = + CarbonTablePath.getSegmentFilesLocation(carbonLoadModel.getTablePath) + + CarbonCommonConstants.FILE_SEPARATOR + carbonLoadModel.getFactTimeStamp + ".tmp" + // Merge all partition files into a single file. + segmentFileName = + mergedLoadNumber + "_" + carbonLoadModel.getFactTimeStamp + val segmentFile = SegmentFileStore Review comment: ```suggestion val mergedSegmentFile = SegmentFileStore ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #4104: URL: https://github.com/apache/carbondata/pull/4104#discussion_r597410799 ########## File path: processing/src/main/java/org/apache/carbondata/processing/merger/CompactionResultSortProcessor.java ########## @@ -149,16 +149,26 @@ private CarbonColumn[] noDicAndComplexColumns; + private boolean needSegmentFile; Review comment: why this boolean is required? if any specific scenario is failing or being handled, please add a comment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4104: URL: https://github.com/apache/carbondata/pull/4104#issuecomment-802570355 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5596/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4104: URL: https://github.com/apache/carbondata/pull/4104#issuecomment-802572216 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3830/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |