vikramahuja1001 opened a new pull request #4051: URL: https://github.com/apache/carbondata/pull/4051 ### Why is this PR needed? While getting stale segment, we do a list files and take all the files, if there is any folder/file other than .segment file, it will lead to further issues while copying data to the trash folder. ### What changes were proposed in this PR? Added a filter, to only consider the files ending with ".segment" ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA2 commented on pull request #4051: URL: https://github.com/apache/carbondata/pull/4051#issuecomment-742442851 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5137/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4051: URL: https://github.com/apache/carbondata/pull/4051#issuecomment-742453013 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3375/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
QiangCai commented on pull request #4051: URL: https://github.com/apache/carbondata/pull/4051#issuecomment-742907587 can you add a test case for fault testing? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on pull request #4051: URL: https://github.com/apache/carbondata/pull/4051#issuecomment-745189584 @QiangCai , i have changed current test cases itself. Please check ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4051: URL: https://github.com/apache/carbondata/pull/4051#issuecomment-745238696 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5166/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4051: URL: https://github.com/apache/carbondata/pull/4051#issuecomment-745241197 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3404/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4051: URL: https://github.com/apache/carbondata/pull/4051#issuecomment-745308252 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5168/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4051: URL: https://github.com/apache/carbondata/pull/4051#issuecomment-745308552 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3406/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on pull request #4051: URL: https://github.com/apache/carbondata/pull/4051#issuecomment-745312384 @vikramahuja1001 : when the issue is found, what other file was present in segment metadata directory ? how it is impacting ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on pull request #4051: URL: https://github.com/apache/carbondata/pull/4051#issuecomment-745790657 @ajantha-bhat , so in the case of partition table, we have a tmp directory in the segment folder. In this case, while we check for stale segments, we are blindly reading all the files in the segment folder and collecting the name of it and then based on that we read the segment files, since it is not a file, thus it cannot be read and would give an exception. In this fix, we just need to filter if the file ends with ".segment" ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #4051: URL: https://github.com/apache/carbondata/pull/4051#discussion_r544064420 ########## File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java ########## @@ -157,11 +157,12 @@ public static void deleteExpiredDataFromTrash(String tablePath) { // Deleting the timestamp based subdirectories in the trashfolder by the given timestamp. try { if (FileFactory.isFileExist(trashPath)) { Review comment: ```suggestion CarbonFile trashFile = FileFactory.getCarbonFile(trashPath); if (trashFile.exists()) { CarbonFile[] timestampFolderList = trashFile.listFiles(); ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #4051: URL: https://github.com/apache/carbondata/pull/4051#discussion_r544064935 ########## File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java ########## @@ -181,7 +182,7 @@ public static void emptyTrash(String tablePath) { // if the trash folder exists delete the contents of the trash folder try { if (FileFactory.isFileExist(trashPath)) { Review comment: handle same as above comment ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #4051: URL: https://github.com/apache/carbondata/pull/4051#discussion_r544069025 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala ########## @@ -38,26 +40,33 @@ case class CarbonCleanFilesCommand( isInternalCleanCall: Boolean = false) extends DataCommand { + val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getCanonicalName) + override def processData(sparkSession: SparkSession): Seq[Row] = { Checker.validateTableExists(databaseNameOp, tableName, sparkSession) val carbonTable = CarbonEnv.getCarbonTable(databaseNameOp, tableName)(sparkSession) setAuditTable(carbonTable) - // if insert overwrite in progress, do not allow delete segment - if (SegmentStatusManager.isOverwriteInProgressInTable(carbonTable)) { + // if insert overwrite in progress and table not a MV, do not allow delete segment + if (!carbonTable.isMV && SegmentStatusManager.isOverwriteInProgressInTable(carbonTable)) { throw new ConcurrentOperationException(carbonTable, "insert overwrite", "clean file") } if (!carbonTable.getTableInfo.isTransactionalTable) { throw new MalformedCarbonCommandException("Unsupported operation on non transactional table") } - val preEvent = CleanFilesPreEvent(carbonTable, sparkSession) - val postEvent = CleanFilesPostEvent(carbonTable, sparkSession, options) - withEvents(preEvent, postEvent) { - DataTrashManager.cleanGarbageData( - carbonTable, - options.getOrElse("force", "false").toBoolean, - options.getOrElse("stale_inprogress", "false").toBoolean, - CarbonFilters.getPartitions(Seq.empty[Expression], sparkSession, carbonTable)) + // only proceed if not a MV and if insert overwrite not in progress + if (!carbonTable.isMV && !SegmentStatusManager.isOverwriteInProgressInTable(carbonTable)) { + val preEvent = CleanFilesPreEvent(carbonTable, sparkSession) + val postEvent = CleanFilesPostEvent(carbonTable, sparkSession, options) + withEvents(preEvent, postEvent) { + DataTrashManager.cleanGarbageData( + carbonTable, + options.getOrElse("force", "false").toBoolean, + options.getOrElse("stale_inprogress", "false").toBoolean, + CarbonFilters.getPartitions(Seq.empty[Expression], sparkSession, carbonTable)) + } + } else { + LOGGER.info(s"Can not do clean files operation for the MV: ${carbonTable.getTableName}") Review comment: Clean files for MV is not supported? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #4051: URL: https://github.com/apache/carbondata/pull/4051#discussion_r544072761 ########## File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java ########## @@ -157,11 +157,12 @@ public static void deleteExpiredDataFromTrash(String tablePath) { // Deleting the timestamp based subdirectories in the trashfolder by the given timestamp. try { if (FileFactory.isFileExist(trashPath)) { Review comment: I think we should deprecate listfiles and fileExist and other API in file factory that takes path, it has to take carbon file instead of path. This way we can force user to think about reusing the carbonFile Object. we can raise a JIRA and assign some new contributors. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #4051: URL: https://github.com/apache/carbondata/pull/4051#discussion_r544073512 ########## File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java ########## @@ -157,11 +157,12 @@ public static void deleteExpiredDataFromTrash(String tablePath) { // Deleting the timestamp based subdirectories in the trashfolder by the given timestamp. try { if (FileFactory.isFileExist(trashPath)) { Review comment: currently Vikram can handle for his PR by accepting suggested changes by Indhu. But many places we are not resuing the CarbonFile object. that can be handled by my above point. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #4051: URL: https://github.com/apache/carbondata/pull/4051#discussion_r544074748 ########## File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java ########## @@ -157,11 +157,12 @@ public static void deleteExpiredDataFromTrash(String tablePath) { // Deleting the timestamp based subdirectories in the trashfolder by the given timestamp. try { if (FileFactory.isFileExist(trashPath)) { Review comment: https://issues.apache.org/jira/browse/CARBONDATA-4086 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4051: URL: https://github.com/apache/carbondata/pull/4051#issuecomment-745862799 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5175/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4051: URL: https://github.com/apache/carbondata/pull/4051#issuecomment-745867314 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3413/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on a change in pull request #4051: URL: https://github.com/apache/carbondata/pull/4051#discussion_r544169226 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala ########## @@ -38,26 +40,33 @@ case class CarbonCleanFilesCommand( isInternalCleanCall: Boolean = false) extends DataCommand { + val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getCanonicalName) + override def processData(sparkSession: SparkSession): Seq[Row] = { Checker.validateTableExists(databaseNameOp, tableName, sparkSession) val carbonTable = CarbonEnv.getCarbonTable(databaseNameOp, tableName)(sparkSession) setAuditTable(carbonTable) - // if insert overwrite in progress, do not allow delete segment - if (SegmentStatusManager.isOverwriteInProgressInTable(carbonTable)) { + // if insert overwrite in progress and table not a MV, do not allow delete segment + if (!carbonTable.isMV && SegmentStatusManager.isOverwriteInProgressInTable(carbonTable)) { throw new ConcurrentOperationException(carbonTable, "insert overwrite", "clean file") } if (!carbonTable.getTableInfo.isTransactionalTable) { throw new MalformedCarbonCommandException("Unsupported operation on non transactional table") } - val preEvent = CleanFilesPreEvent(carbonTable, sparkSession) - val postEvent = CleanFilesPostEvent(carbonTable, sparkSession, options) - withEvents(preEvent, postEvent) { - DataTrashManager.cleanGarbageData( - carbonTable, - options.getOrElse("force", "false").toBoolean, - options.getOrElse("stale_inprogress", "false").toBoolean, - CarbonFilters.getPartitions(Seq.empty[Expression], sparkSession, carbonTable)) + // only proceed if not a MV and if insert overwrite not in progress + if (!carbonTable.isMV && !SegmentStatusManager.isOverwriteInProgressInTable(carbonTable)) { + val preEvent = CleanFilesPreEvent(carbonTable, sparkSession) + val postEvent = CleanFilesPostEvent(carbonTable, sparkSession, options) + withEvents(preEvent, postEvent) { + DataTrashManager.cleanGarbageData( + carbonTable, + options.getOrElse("force", "false").toBoolean, + options.getOrElse("stale_inprogress", "false").toBoolean, + CarbonFilters.getPartitions(Seq.empty[Expression], sparkSession, carbonTable)) + } + } else { + LOGGER.info(s"Can not do clean files operation for the MV: ${carbonTable.getTableName}") Review comment: changed code ########## File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java ########## @@ -181,7 +182,7 @@ public static void emptyTrash(String tablePath) { // if the trash folder exists delete the contents of the trash folder try { if (FileFactory.isFileExist(trashPath)) { Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |