Karan980 opened a new pull request #4070: URL: https://github.com/apache/carbondata/pull/4070 ### Why is this PR needed? When a segment is added to a carbon table by alter table add segment query and that segment also have a deleteDelta file present in it, then on querying the carbon table the deleted rows are coming in the result. ### What changes were proposed in this PR? Updating the tableStatus and tableUpdateStatus files in correct way for the segments having delta delta files. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA2 commented on pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#issuecomment-753829271 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5272/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#issuecomment-753830863 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3511/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on a change in pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#discussion_r551243624 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala ########## @@ -82,6 +82,59 @@ class AddSegmentTestCase extends QueryTest with BeforeAndAfterAll { FileFactory.deleteAllFilesOfDir(new File(newPath)) } + test("Test add segment for the segment having delete delta files") { Review comment: Add a test case for update too ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Karan980 commented on a change in pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#discussion_r551275838 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala ########## @@ -82,6 +82,59 @@ class AddSegmentTestCase extends QueryTest with BeforeAndAfterAll { FileFactory.deleteAllFilesOfDir(new File(newPath)) } + test("Test add segment for the segment having delete delta files") { Review comment: Done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#issuecomment-753973981 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5277/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#issuecomment-753976459 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3516/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
QiangCai commented on a change in pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#discussion_r553073502 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala ########## @@ -294,6 +297,49 @@ case class CarbonAddLoadCommand( OperationListenerBus.getInstance().fireEvent(loadTablePreStatusUpdateEvent, operationContext) } + val deltaFiles = FileFactory.getCarbonFile(segmentPath).listFiles() + .filter(_.getName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) + if (deltaFiles.length > 0) { + val blockNameToDeltaFilesMap = + collection.mutable.Map[String, collection.mutable.ListBuffer[(CarbonFile, String)]]() + deltaFiles.foreach { deltaFile => + val tmpDeltaFilePath = deltaFile.getAbsolutePath + .replace(CarbonCommonConstants.WINDOWS_FILE_SEPARATOR, + CarbonCommonConstants.FILE_SEPARATOR) + val deltaFilePathElements = tmpDeltaFilePath.split(CarbonCommonConstants.FILE_SEPARATOR) + if (deltaFilePathElements != null && deltaFilePathElements.nonEmpty) { + val deltaFileName = deltaFilePathElements(deltaFilePathElements.length - 1) + val blockName = CarbonTablePath.DataFileUtil + .getBlockNameFromDeleteDeltaFile(deltaFileName) + if (blockNameToDeltaFilesMap.contains(blockName)) { + blockNameToDeltaFilesMap(blockName) += ((deltaFile, deltaFileName)) + } else { + val deltaFileList = new ListBuffer[(CarbonFile, String)]() + deltaFileList += ((deltaFile, deltaFileName)) + blockNameToDeltaFilesMap.put(blockName, deltaFileList) + } + } + } + val segmentUpdateDetails = new util.ArrayList[SegmentUpdateDetails]() + val columnCompressor = CompressorFactory.getInstance.getCompressor.getName + blockNameToDeltaFilesMap.foreach { entry => + val segmentUpdateDetail = new SegmentUpdateDetails() + segmentUpdateDetail.setBlockName(entry._1) + segmentUpdateDetail.setActualBlockName( + entry._1 + CarbonCommonConstants.POINT + columnCompressor + + CarbonCommonConstants.FACT_FILE_EXT) + segmentUpdateDetail.setSegmentName(model.getSegmentId) + setMinMaxDeltaStampAndDeletedRowCount(entry._2, segmentUpdateDetail) + segmentUpdateDetails.add(segmentUpdateDetail) + } + val timestamp = System.currentTimeMillis().toString + val segmentDetails = new util.HashSet[Segment]() + segmentDetails.add(model.getSegment) + CarbonUpdateUtil.updateSegmentStatus(segmentUpdateDetails, carbonTable, timestamp, false) Review comment: can we write tablestatus file once during add load command? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
QiangCai commented on a change in pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#discussion_r553074562 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala ########## @@ -294,6 +297,49 @@ case class CarbonAddLoadCommand( OperationListenerBus.getInstance().fireEvent(loadTablePreStatusUpdateEvent, operationContext) } + val deltaFiles = FileFactory.getCarbonFile(segmentPath).listFiles() Review comment: how to know which deletedelta files are valid? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#issuecomment-757531715 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3532/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#issuecomment-757531962 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5292/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#issuecomment-759238242 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3546/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#issuecomment-759239290 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5306/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Karan980 commented on pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#issuecomment-762604221 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#issuecomment-762634833 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5313/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#issuecomment-762637383 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3553/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#issuecomment-763751165 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5322/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#issuecomment-763757918 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3562/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Karan980 commented on a change in pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#discussion_r561607408 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala ########## @@ -294,6 +297,49 @@ case class CarbonAddLoadCommand( OperationListenerBus.getInstance().fireEvent(loadTablePreStatusUpdateEvent, operationContext) } + val deltaFiles = FileFactory.getCarbonFile(segmentPath).listFiles() Review comment: Now, i'm taking just one delete delta file with highest timestamp for each block, by assuming that threshold for horizontal compaction is not changed from default. In case of SDK, all files are valid as horizontal compaction is not there in SDK ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Karan980 commented on a change in pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#discussion_r561607408 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala ########## @@ -294,6 +297,49 @@ case class CarbonAddLoadCommand( OperationListenerBus.getInstance().fireEvent(loadTablePreStatusUpdateEvent, operationContext) } + val deltaFiles = FileFactory.getCarbonFile(segmentPath).listFiles() Review comment: Now, i'm taking just one delete delta file with highest timestamp for each block, by assuming that threshold for horizontal compaction is not changed from default. In case of SDK, all files are valid as horizontal compaction is not there in SDK ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |