vikramahuja1001 opened a new pull request #4035: URL: https://github.com/apache/carbondata/pull/4035 ### Why is this PR needed? Change the behaviour change for clean files operation Old behaviour: Clean files command is by default force option and ignores query timeout. ### What changes were proposed in this PR? New Behaviour: default clean files behaviour(clean files on table t1): clean MFD and Compacted segments after query timeout(1 hr) clean files on table t1 options('force'='true'): clean MFD and Compacted segments immediately clean files on table t1 options('clean_inprgress'='true') : clean stale inprogress segments after 7 days(default behaviour) clean files on table t1 options('clean_inprgress'='true', 'force'='true') : clean MFD, Compacted and stale inprogress segments immediately. ### Does this PR introduce any user interface change? - Yes. (please explain the change and update document) ### Is any new testcase added? - No (previous test cases changed) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA2 commented on pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#issuecomment-737107284 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3266/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#issuecomment-737107889 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5022/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#issuecomment-737176415 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5023/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#issuecomment-737177718 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3267/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#issuecomment-737253057 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5026/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#issuecomment-737256672 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3270/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
QiangCai commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r534596239 ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ########## @@ -1095,7 +1099,8 @@ public static void deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean // if execute command 'clean files' or the number of invisible segment info // exceeds the value of 'carbon.invisible.segments.preserve.count', // it need to append the invisible segment list to 'tablestatus.history' file. - if (isForceDeletion || (invisibleSegmentCnt > invisibleSegmentPreserveCnt)) { + if (cleanStaleInprogress || cleanCompactedAndMFD || (invisibleSegmentCnt > Review comment: no need move load metadata to history always ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ########## @@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier, } } - private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable, - AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) { + private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean Review comment: why remove isForceDeletion? ########## File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java ########## @@ -173,40 +176,62 @@ public boolean accept(CarbonFile file) { } private static boolean checkIfLoadCanBeDeleted(LoadMetadataDetails oneLoad, - boolean isForceDelete) { - if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() || - SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() || - SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() || - SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS == oneLoad.getSegmentStatus()) - && oneLoad.getVisibility().equalsIgnoreCase("true")) { - if (isForceDelete) { - return true; - } - long deletionTime = oneLoad.getModificationOrDeletionTimestamp(); - return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil - .isMaxQueryTimeoutExceeded(deletionTime); + boolean cleanStaleInProgress, boolean cleanCompactedAndMFD) { + if (oneLoad.getVisibility().equalsIgnoreCase("true")) { + return checkLoadDeletionLogic(oneLoad, cleanCompactedAndMFD, cleanStaleInProgress); } - return false; } private static boolean checkIfLoadCanBeDeletedPhysically(LoadMetadataDetails oneLoad, - boolean isForceDelete) { + boolean cleanCompactedAndMFD, boolean cleanStaleInProgress) { // Check if the segment is added externally and path is set then do not delete it - if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() - || SegmentStatus.COMPACTED == oneLoad.getSegmentStatus()) && (oneLoad.getPath() == null - || oneLoad.getPath().equalsIgnoreCase("NA"))) { - if (isForceDelete) { - return true; - } - long deletionTime = oneLoad.getModificationOrDeletionTimestamp(); + if (oneLoad.getPath() == null || oneLoad.getPath().equalsIgnoreCase("NA")) { + return checkLoadDeletionLogic(oneLoad, cleanCompactedAndMFD, cleanStaleInProgress); + } + return false; + } - return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil + private static Boolean checkLoadDeletionLogic(LoadMetadataDetails oneLoad, + boolean cleanCompactedAndMFD, boolean cleanStaleInProgress) { + /* + * if cleanStaleInProgress == false and cleanCompactedAndMFD == false, clean MFD and Compacted + * segments after trashtimeout(7days) && query timeout(1 hr) + * if cleanStaleInProgress == false and cleanCompactedAndMFD == true, clean MFD and Compacted + * segments immediately + * if cleanStaleInProgress == true and cleanCompactedAndMFD == false, clean Stale Inprogress + * segments after 7 days(taking carbon.trash.retention.time value) + * if cleanStaleInProgress == true and cleanCompactedAndMFD == true, clean MFD, Compacted and + * stale inprogress segments immediately. + */ + if (!cleanCompactedAndMFD && !cleanStaleInProgress) { + if (SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() || SegmentStatus + .MARKED_FOR_DELETE == oneLoad.getSegmentStatus()) { + long deletionTime = oneLoad.getModificationOrDeletionTimestamp(); + return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil .isMaxQueryTimeoutExceeded(deletionTime); - + } + return false; + } else if (cleanCompactedAndMFD && !cleanStaleInProgress) { + return SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() || SegmentStatus + .MARKED_FOR_DELETE == oneLoad.getSegmentStatus(); + } else if (!cleanCompactedAndMFD) { Review comment: It is hard to understand each if condition. please improve the whole if code. ########## File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java ########## @@ -173,40 +176,62 @@ public boolean accept(CarbonFile file) { } private static boolean checkIfLoadCanBeDeleted(LoadMetadataDetails oneLoad, - boolean isForceDelete) { - if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() || - SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() || - SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() || - SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS == oneLoad.getSegmentStatus()) - && oneLoad.getVisibility().equalsIgnoreCase("true")) { - if (isForceDelete) { - return true; - } - long deletionTime = oneLoad.getModificationOrDeletionTimestamp(); - return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil - .isMaxQueryTimeoutExceeded(deletionTime); + boolean cleanStaleInProgress, boolean cleanCompactedAndMFD) { + if (oneLoad.getVisibility().equalsIgnoreCase("true")) { + return checkLoadDeletionLogic(oneLoad, cleanCompactedAndMFD, cleanStaleInProgress); } - return false; } private static boolean checkIfLoadCanBeDeletedPhysically(LoadMetadataDetails oneLoad, - boolean isForceDelete) { + boolean cleanCompactedAndMFD, boolean cleanStaleInProgress) { // Check if the segment is added externally and path is set then do not delete it - if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() - || SegmentStatus.COMPACTED == oneLoad.getSegmentStatus()) && (oneLoad.getPath() == null - || oneLoad.getPath().equalsIgnoreCase("NA"))) { - if (isForceDelete) { - return true; - } - long deletionTime = oneLoad.getModificationOrDeletionTimestamp(); + if (oneLoad.getPath() == null || oneLoad.getPath().equalsIgnoreCase("NA")) { + return checkLoadDeletionLogic(oneLoad, cleanCompactedAndMFD, cleanStaleInProgress); + } + return false; + } - return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil + private static Boolean checkLoadDeletionLogic(LoadMetadataDetails oneLoad, + boolean cleanCompactedAndMFD, boolean cleanStaleInProgress) { + /* + * if cleanStaleInProgress == false and cleanCompactedAndMFD == false, clean MFD and Compacted + * segments after trashtimeout(7days) && query timeout(1 hr) + * if cleanStaleInProgress == false and cleanCompactedAndMFD == true, clean MFD and Compacted + * segments immediately + * if cleanStaleInProgress == true and cleanCompactedAndMFD == false, clean Stale Inprogress + * segments after 7 days(taking carbon.trash.retention.time value) + * if cleanStaleInProgress == true and cleanCompactedAndMFD == true, clean MFD, Compacted and + * stale inprogress segments immediately. + */ + if (!cleanCompactedAndMFD && !cleanStaleInProgress) { + if (SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() || SegmentStatus + .MARKED_FOR_DELETE == oneLoad.getSegmentStatus()) { + long deletionTime = oneLoad.getModificationOrDeletionTimestamp(); + return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil .isMaxQueryTimeoutExceeded(deletionTime); - + } + return false; + } else if (cleanCompactedAndMFD && !cleanStaleInProgress) { Review comment: cleanCompactedAndMFD means forceDelete ? ########## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ########## @@ -147,17 +147,28 @@ public static void cleanStaleSegmentsForPartitionTable(CarbonTable carbonTable) * stale segment. Only comparing from tablestatus file, not checking tablestatus.history file */ private static void getStaleSegmentFiles(CarbonTable carbonTable, List<String> staleSegmentFiles, - List<String> redundantSegmentFile) { + List<String> redundantSegmentFile) throws IOException { String segmentFilesLocation = CarbonTablePath.getSegmentFilesLocation(carbonTable.getTablePath()); List<String> segmentFiles = Arrays.stream(FileFactory.getCarbonFile(segmentFilesLocation) .listFiles()).map(CarbonFile::getName).collect(Collectors.toList()); // there are no segments present in the Metadata folder. Can return here - if (segmentFiles.size() == 0) { - return; + // if table status file does not exist return + try { + if (segmentFiles.size() == 0 || !FileFactory.isFileExist(CarbonTablePath Review comment: maybe not require to check tablestatus ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
QiangCai commented on pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#issuecomment-737611158 @vikramahuja1001 please change title to describe what you change ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
QiangCai commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r534596239 ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ########## @@ -1095,7 +1099,8 @@ public static void deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean // if execute command 'clean files' or the number of invisible segment info // exceeds the value of 'carbon.invisible.segments.preserve.count', // it need to append the invisible segment list to 'tablestatus.history' file. - if (isForceDeletion || (invisibleSegmentCnt > invisibleSegmentPreserveCnt)) { + if (cleanStaleInprogress || cleanCompactedAndMFD || (invisibleSegmentCnt > Review comment: does clean files move load metadata to history always ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
QiangCai commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r534596239 ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ########## @@ -1095,7 +1099,8 @@ public static void deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean // if execute command 'clean files' or the number of invisible segment info // exceeds the value of 'carbon.invisible.segments.preserve.count', // it need to append the invisible segment list to 'tablestatus.history' file. - if (isForceDeletion || (invisibleSegmentCnt > invisibleSegmentPreserveCnt)) { + if (cleanStaleInprogress || cleanCompactedAndMFD || (invisibleSegmentCnt > Review comment: do clean files move load metadata to history always ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r534678032 ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ########## @@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier, } } - private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable, - AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) { + private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean Review comment: Instead of just a force option deleting all MFD, Compacted and stale Insert In Progress segments, dividing them into 2 parameters, cleanMFDandCompacted and cleanStaleInProgress. CleanMFDAndCompacted parameter will decide if MFD and Compacted segments can be deleted and cleanStaleInProgress will decide if stale InProgress segments can be deleted. After giving these 2 parameters, force can be removed. if the user wants to do force deletion, they can set both cleanMFDandCompacted and cleanStaleInProgress to true. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r534678759 ########## File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java ########## @@ -173,40 +176,62 @@ public boolean accept(CarbonFile file) { } private static boolean checkIfLoadCanBeDeleted(LoadMetadataDetails oneLoad, - boolean isForceDelete) { - if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() || - SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() || - SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() || - SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS == oneLoad.getSegmentStatus()) - && oneLoad.getVisibility().equalsIgnoreCase("true")) { - if (isForceDelete) { - return true; - } - long deletionTime = oneLoad.getModificationOrDeletionTimestamp(); - return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil - .isMaxQueryTimeoutExceeded(deletionTime); + boolean cleanStaleInProgress, boolean cleanCompactedAndMFD) { + if (oneLoad.getVisibility().equalsIgnoreCase("true")) { + return checkLoadDeletionLogic(oneLoad, cleanCompactedAndMFD, cleanStaleInProgress); } - return false; } private static boolean checkIfLoadCanBeDeletedPhysically(LoadMetadataDetails oneLoad, - boolean isForceDelete) { + boolean cleanCompactedAndMFD, boolean cleanStaleInProgress) { // Check if the segment is added externally and path is set then do not delete it - if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() - || SegmentStatus.COMPACTED == oneLoad.getSegmentStatus()) && (oneLoad.getPath() == null - || oneLoad.getPath().equalsIgnoreCase("NA"))) { - if (isForceDelete) { - return true; - } - long deletionTime = oneLoad.getModificationOrDeletionTimestamp(); + if (oneLoad.getPath() == null || oneLoad.getPath().equalsIgnoreCase("NA")) { + return checkLoadDeletionLogic(oneLoad, cleanCompactedAndMFD, cleanStaleInProgress); + } + return false; + } - return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil + private static Boolean checkLoadDeletionLogic(LoadMetadataDetails oneLoad, + boolean cleanCompactedAndMFD, boolean cleanStaleInProgress) { + /* + * if cleanStaleInProgress == false and cleanCompactedAndMFD == false, clean MFD and Compacted + * segments after trashtimeout(7days) && query timeout(1 hr) + * if cleanStaleInProgress == false and cleanCompactedAndMFD == true, clean MFD and Compacted + * segments immediately + * if cleanStaleInProgress == true and cleanCompactedAndMFD == false, clean Stale Inprogress + * segments after 7 days(taking carbon.trash.retention.time value) + * if cleanStaleInProgress == true and cleanCompactedAndMFD == true, clean MFD, Compacted and + * stale inprogress segments immediately. + */ + if (!cleanCompactedAndMFD && !cleanStaleInProgress) { + if (SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() || SegmentStatus + .MARKED_FOR_DELETE == oneLoad.getSegmentStatus()) { + long deletionTime = oneLoad.getModificationOrDeletionTimestamp(); + return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil .isMaxQueryTimeoutExceeded(deletionTime); - + } + return false; + } else if (cleanCompactedAndMFD && !cleanStaleInProgress) { Review comment: previously forceDelete would delete MFD, Compacted and stale INprogress segments immediately. cleanCompactedAndMFD means only delete MFD, Compacted(immediately) and cleanStaleInProgress means only delete staleInprogress segments. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r534678969 ########## File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java ########## @@ -173,40 +176,62 @@ public boolean accept(CarbonFile file) { } private static boolean checkIfLoadCanBeDeleted(LoadMetadataDetails oneLoad, - boolean isForceDelete) { - if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() || - SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() || - SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() || - SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS == oneLoad.getSegmentStatus()) - && oneLoad.getVisibility().equalsIgnoreCase("true")) { - if (isForceDelete) { - return true; - } - long deletionTime = oneLoad.getModificationOrDeletionTimestamp(); - return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil - .isMaxQueryTimeoutExceeded(deletionTime); + boolean cleanStaleInProgress, boolean cleanCompactedAndMFD) { + if (oneLoad.getVisibility().equalsIgnoreCase("true")) { + return checkLoadDeletionLogic(oneLoad, cleanCompactedAndMFD, cleanStaleInProgress); } - return false; } private static boolean checkIfLoadCanBeDeletedPhysically(LoadMetadataDetails oneLoad, - boolean isForceDelete) { + boolean cleanCompactedAndMFD, boolean cleanStaleInProgress) { // Check if the segment is added externally and path is set then do not delete it - if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() - || SegmentStatus.COMPACTED == oneLoad.getSegmentStatus()) && (oneLoad.getPath() == null - || oneLoad.getPath().equalsIgnoreCase("NA"))) { - if (isForceDelete) { - return true; - } - long deletionTime = oneLoad.getModificationOrDeletionTimestamp(); + if (oneLoad.getPath() == null || oneLoad.getPath().equalsIgnoreCase("NA")) { + return checkLoadDeletionLogic(oneLoad, cleanCompactedAndMFD, cleanStaleInProgress); + } + return false; + } - return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil + private static Boolean checkLoadDeletionLogic(LoadMetadataDetails oneLoad, + boolean cleanCompactedAndMFD, boolean cleanStaleInProgress) { + /* + * if cleanStaleInProgress == false and cleanCompactedAndMFD == false, clean MFD and Compacted + * segments after trashtimeout(7days) && query timeout(1 hr) + * if cleanStaleInProgress == false and cleanCompactedAndMFD == true, clean MFD and Compacted + * segments immediately + * if cleanStaleInProgress == true and cleanCompactedAndMFD == false, clean Stale Inprogress + * segments after 7 days(taking carbon.trash.retention.time value) + * if cleanStaleInProgress == true and cleanCompactedAndMFD == true, clean MFD, Compacted and + * stale inprogress segments immediately. + */ + if (!cleanCompactedAndMFD && !cleanStaleInProgress) { + if (SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() || SegmentStatus + .MARKED_FOR_DELETE == oneLoad.getSegmentStatus()) { + long deletionTime = oneLoad.getModificationOrDeletionTimestamp(); + return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil .isMaxQueryTimeoutExceeded(deletionTime); - + } + return false; + } else if (cleanCompactedAndMFD && !cleanStaleInProgress) { + return SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() || SegmentStatus + .MARKED_FOR_DELETE == oneLoad.getSegmentStatus(); + } else if (!cleanCompactedAndMFD) { Review comment: I have added comments in the code to tell what exactly is happening and also added in the PR description. Please refer it once ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#issuecomment-737723330 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5037/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#issuecomment-737723996 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3280/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r534788928 ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ########## @@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier, } } - private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable, - AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) { + private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean + cleanCompactedAndMFD, CarbonTable carbonTable, AbsoluteTableIdentifier + absoluteTableIdentifier, LoadMetadataDetails[] details) { // Delete marked loads boolean isUpdateRequired = DeleteLoadFolders - .deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, isForceDeletion, details, - carbonTable.getMetadataPath()); + .deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, cleanStaleInProgress, + cleanCompactedAndMFD, details, carbonTable.getMetadataPath()); return new ReturnTuple(details, isUpdateRequired); } - public static void deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean isForceDeletion, - List<PartitionSpec> partitionSpecs) throws IOException { + public static void deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, + List<PartitionSpec> partitionSpecs, Boolean cleanStaleInprogress, + Boolean cleanCompactedAndMFD) throws IOException { Review comment: please correct the format here ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ########## @@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier, } } - private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable, - AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) { + private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean + cleanCompactedAndMFD, CarbonTable carbonTable, AbsoluteTableIdentifier + absoluteTableIdentifier, LoadMetadataDetails[] details) { // Delete marked loads boolean isUpdateRequired = DeleteLoadFolders - .deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, isForceDeletion, details, - carbonTable.getMetadataPath()); + .deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, cleanStaleInProgress, + cleanCompactedAndMFD, details, carbonTable.getMetadataPath()); return new ReturnTuple(details, isUpdateRequired); } - public static void deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean isForceDeletion, - List<PartitionSpec> partitionSpecs) throws IOException { + public static void deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, + List<PartitionSpec> partitionSpecs, Boolean cleanStaleInprogress, + Boolean cleanCompactedAndMFD) throws IOException { Review comment: wherever you are calling this methods, please name the Boolean variable ########## File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java ########## @@ -173,40 +176,68 @@ public boolean accept(CarbonFile file) { } private static boolean checkIfLoadCanBeDeleted(LoadMetadataDetails oneLoad, - boolean isForceDelete) { - if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() || - SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() || - SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() || - SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS == oneLoad.getSegmentStatus()) - && oneLoad.getVisibility().equalsIgnoreCase("true")) { - if (isForceDelete) { - return true; - } - long deletionTime = oneLoad.getModificationOrDeletionTimestamp(); - return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil - .isMaxQueryTimeoutExceeded(deletionTime); + boolean cleanStaleInProgress, boolean cleanCompactedAndMFD) { + if (oneLoad.getVisibility().equalsIgnoreCase("true")) { Review comment: here checking visibility to true, but, when i do delete segment by id or date, that time we will set the visibility to false, how its taken care that time? is it already handed? ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ########## @@ -1095,7 +1099,8 @@ public static void deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean // if execute command 'clean files' or the number of invisible segment info // exceeds the value of 'carbon.invisible.segments.preserve.count', // it need to append the invisible segment list to 'tablestatus.history' file. - if (isForceDeletion || (invisibleSegmentCnt > invisibleSegmentPreserveCnt)) { + if (cleanStaleInprogress || cleanCompactedAndMFD || (invisibleSegmentCnt > Review comment: `cleanStaleInprogress || cleanCompactedAndMFD` is this condition really required here? as functionality is just moving to history file ########## File path: integration/spark/src/main/scala/org/apache/carbondata/events/CleanFilesEvents.scala ########## @@ -32,7 +32,10 @@ case class CleanFilesPreEvent(carbonTable: CarbonTable, sparkSession: SparkSessi /** * * @param carbonTable + * @param cleanStaleInProgress + * @param cleanCompactedAndMFD Review comment: ifnot adding any description remove these, already variable name depicts whats it for ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala ########## @@ -58,8 +58,10 @@ case class CarbonCleanFilesCommand( var carbonTable: CarbonTable = _ var cleanFileCommands: List[CarbonCleanFilesCommand] = List.empty val optionsMap = options.getOrElse(List.empty[(String, String)]).toMap - // forceClean will empty trash + // forceClean will clean the MFD and Compacted segments immediately val forceClean = optionsMap.getOrElse("force", "false").toBoolean + // stale_inprogress willclean the In Progress segments immediately Review comment: comment is wrong right? because when `stale_inprogress = true`, you will not clean immediately, you will wait for retention time, when both are true, only then you will delete immediately, please update it accordingly ########## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ########## @@ -571,7 +571,7 @@ object CommonUtil { try { val carbonTable = CarbonMetadata.getInstance .getCarbonTable(tableUniqueName) - SegmentStatusManager.deleteLoadsAndUpdateMetadata(carbonTable, true, null) + SegmentStatusManager.deleteLoadsAndUpdateMetadata(carbonTable, null, true, false) Review comment: please name boolean variables, as the first comment ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala ########## @@ -117,9 +119,9 @@ case class CarbonCleanFilesCommand( CleanFilesUtil.cleanStaleSegments(carbonTable) } if (forceTableClean) { Review comment: can we remove this `forceTableClean`? i dont see anyone using variable or passing this as true, and its always false ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ########## @@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier, } } - private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable, Review comment: there is method `writeLoadMetadata ` which take identifier as input, its unused method please check and remove that code ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala ########## @@ -125,7 +125,7 @@ case class CarbonLoadDataCommand(databaseNameOp: Option[String], updateModel = None, operationContext = operationContext) // Clean up the old invalid segment data before creating a new entry for new load. - SegmentStatusManager.deleteLoadsAndUpdateMetadata(table, false, currPartitions) + SegmentStatusManager.deleteLoadsAndUpdateMetadata(table, currPartitions, false, false) Review comment: same as above ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala ########## @@ -462,7 +462,7 @@ object SecondaryIndexCreator { try { if (!isCompactionCall) { SegmentStatusManager - .deleteLoadsAndUpdateMetadata(indexCarbonTable, false, null) + .deleteLoadsAndUpdateMetadata(indexCarbonTable, null, false, false) Review comment: name boolean ########## File path: integration/spark/src/main/scala/org/apache/spark/util/CleanFiles.scala ########## @@ -39,7 +39,8 @@ object CleanFiles { * drop table from hive metastore so should be very careful to use it. */ def cleanFiles(spark: SparkSession, dbName: String, tableName: String, - forceTableClean: Boolean = false): Unit = { + forceTableClean: Boolean = false, cleanCompactedAndMFD: Boolean = false, + cleanStaleInProgress: Boolean = false ): Unit = { Review comment: please reformat the code ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/cleanfiles/TestCleanFileCommand.scala ########## @@ -346,4 +346,10 @@ class TestCleanFileCommand extends QueryTest with BeforeAndAfterAll { sql(s"""INSERT INTO CLEANTEST SELECT "abc", 1, "name"""") } + def removeSegmentEntryFromTableStatusFile(carbonTable: CarbonTable, segmentNo: String) : Unit = { Review comment: i see that this method is used in multiple places, please move to utility method and use everywhere ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/cleanfiles/TestCleanFileCommand.scala ########## @@ -212,14 +211,16 @@ class TestCleanFileCommand extends QueryTest with BeforeAndAfterAll { test("test trash folder with 2 segments with same segment number") { createTable() - sql(s"""INSERT INTO CLEANTEST SELECT "1", 2, "name"""") + loadData() val path = CarbonEnv.getCarbonTable(Some("default"), "cleantest")(sqlContext.sparkSession) .getTablePath val trashFolderPath = CarbonTablePath.getTrashFolderPath(path) assert(!FileFactory.isFileExist(trashFolderPath)) // All 4 segments are made as stale segments, they should be moved to the trash folder - deleteTableStatusFile(path) + // deleteTableStatusFile(path) Review comment: please check all places ########## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ########## @@ -147,17 +147,28 @@ public static void cleanStaleSegmentsForPartitionTable(CarbonTable carbonTable) * stale segment. Only comparing from tablestatus file, not checking tablestatus.history file */ private static void getStaleSegmentFiles(CarbonTable carbonTable, List<String> staleSegmentFiles, - List<String> redundantSegmentFile) { + List<String> redundantSegmentFile) throws IOException { String segmentFilesLocation = CarbonTablePath.getSegmentFilesLocation(carbonTable.getTablePath()); List<String> segmentFiles = Arrays.stream(FileFactory.getCarbonFile(segmentFilesLocation) .listFiles()).map(CarbonFile::getName).collect(Collectors.toList()); // there are no segments present in the Metadata folder. Can return here - if (segmentFiles.size() == 0) { - return; + // if table status file does not exist return + try { + if (segmentFiles.size() == 0 || !FileFactory.isFileExist(CarbonTablePath Review comment: agree with @QiangCai , in the below line already it will take care ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertIntoWithDf.scala ########## @@ -108,7 +108,7 @@ case class CarbonInsertIntoWithDf(databaseNameOp: Option[String], operationContext = operationContext) // Clean up the old invalid segment data before creating a new entry for new load. - SegmentStatusManager.deleteLoadsAndUpdateMetadata(table, false, currPartitions) + SegmentStatusManager.deleteLoadsAndUpdateMetadata(table, currPartitions, false, false) Review comment: name boolean ########## File path: core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java ########## @@ -147,17 +147,28 @@ public static void cleanStaleSegmentsForPartitionTable(CarbonTable carbonTable) * stale segment. Only comparing from tablestatus file, not checking tablestatus.history file */ private static void getStaleSegmentFiles(CarbonTable carbonTable, List<String> staleSegmentFiles, - List<String> redundantSegmentFile) { + List<String> redundantSegmentFile) throws IOException { String segmentFilesLocation = CarbonTablePath.getSegmentFilesLocation(carbonTable.getTablePath()); List<String> segmentFiles = Arrays.stream(FileFactory.getCarbonFile(segmentFilesLocation) .listFiles()).map(CarbonFile::getName).collect(Collectors.toList()); // there are no segments present in the Metadata folder. Can return here - if (segmentFiles.size() == 0) { - return; + // if table status file does not exist return + try { + if (segmentFiles.size() == 0 || !FileFactory.isFileExist(CarbonTablePath + .getTableStatusFilePath(carbonTable.getTablePath()))) { + return; + } + } catch (IOException e) { + LOGGER.error("Unable to Check if tablestatus file exists while getting stale segments", e); + throw e; } LoadMetadataDetails[] details = SegmentStatusManager.readLoadMetadata(carbonTable .getMetadataPath()); + // return if table status file is empty Review comment: change the comment, there wont be any scenario where the table status file is empty ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/cleanfiles/TestCleanFileCommand.scala ########## @@ -90,25 +100,21 @@ class TestCleanFileCommand extends QueryTest with BeforeAndAfterAll { val trashFolderPath = CarbonTablePath.getTrashFolderPath(path) assert(!FileFactory.isFileExist(trashFolderPath)) // All 4 segments are made as stale segments and should be moved to trash - deleteTableStatusFile(path) + // deleteTableStatusFile(path) Review comment: if not required remove , instead of commenting ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/cleanfiles/TestCleanFileCommand.scala ########## @@ -151,34 +157,31 @@ class TestCleanFileCommand extends QueryTest with BeforeAndAfterAll { val mainTablePath = CarbonEnv.getCarbonTable(Some("default"), "cleantest")(sqlContext .sparkSession).getTablePath - deleteTableStatusFile(mainTablePath) + // deleteTableStatusFile(mainTablePath) Review comment: same as above ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#issuecomment-737734515 @vikramahuja1001 i dont see any test case with` clean_inprgress = true`, please add the missing test cases ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535042468 ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ########## @@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier, } } - private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable, - AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) { + private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean Review comment: @vikramahuja1001 : please update the document of clean files. > 1 ) default clean files behaviour(clean files on table t1): clean MFD and Compacted segments will depend on query timeout(1 hr) and trashRetentionTimeout(7 days, default). 2) clean files on table t1 options('force'='true'): clean MFD and Compacted segments immediately(Do not check for any timeout) 3) clean files on table t1 options('clean_inprgress'='true') : clean stale inprogress segments depends on trashRetentionTimeout, after 7 days(default behaviour) 4) clean files on table t1 options('clean_inprgress'='true', 'force'='true') : clean MFD, Compacted and stale inprogress segments immediately.(Do not check for any timeout) **my comments for description** 1) you have not mentioned about trash in any of this. mention what happens to trash in each. 2) And in your point 3) we should delete the expired MFD, compacted and expired trash folders also. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #4035: URL: https://github.com/apache/carbondata/pull/4035#discussion_r535043361 ########## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ########## @@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier, } } - private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable, - AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) { + private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean + cleanCompactedAndMFD, CarbonTable carbonTable, AbsoluteTableIdentifier + absoluteTableIdentifier, LoadMetadataDetails[] details) { // Delete marked loads boolean isUpdateRequired = DeleteLoadFolders - .deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, isForceDeletion, details, - carbonTable.getMetadataPath()); + .deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, cleanStaleInProgress, + cleanCompactedAndMFD, details, carbonTable.getMetadataPath()); Review comment: clean `cleanCompactedAndMFD` flag is no need as everytime we need to clean it, after you fix your point 3 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |