[GitHub] [carbondata] vikramahuja1001 opened a new pull request #4035: [WIP]: CleanFiles Behaviour Change

classic Classic list List threaded Threaded
64 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress segments can be deleted

GitBox

vikramahuja1001 commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535054960



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##########
@@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier,
     }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable,
-      AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) {
+  private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean

Review comment:
       1. Trash behaviour remains unchanged. Trash will be deleted automatically when option force = true, it's already in the document, If force = true is not given, then it will delete trash according to timestamp retention property.
   2. in_progress = true, should only delete in_progress segments. Expired trash will be deleted anyways.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress segments can be deleted

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535062363



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##########
@@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier,
     }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable,
-      AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) {
+  private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean

Review comment:
       If we don't clean expired MFD, expired compacted and expired trash along with expired stale_inprogress. we have to call clean files at least two times to clean all expired data. I feel it is not good.
   
   @QiangCai @akashrn5 : what you think ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress segments can be deleted

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535062363



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##########
@@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier,
     }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable,
-      AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) {
+  private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean

Review comment:
       in point 3), If we don't clean expired MFD, expired compacted and expired trash along with expired stale_inprogress. we have to call clean files at least two times to clean all expired data. I feel it is not good.
   
   @QiangCai @akashrn5 : what you think ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress segments can be deleted

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535066388



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##########
@@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier,
     }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable,
-      AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) {
+  private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean

Review comment:
       @QiangCai @akashrn5 , can you clarify if we should delete the expired MFD, compacted in point 3?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress segments can be deleted

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535070890



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##########
@@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier,
     }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable,
-      AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) {
+  private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean

Review comment:
       agree with @ajantha-bhat , the clean files with options, let it do its specific functionality, along with this, if others expired we should do that also, to save time for users.It should be like (1+2), (1+3) and (1+4). I think this will be better




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress segments can be deleted

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r534678032



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##########
@@ -1039,17 +1039,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier,
     }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable,
-      AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) {
+  private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean

Review comment:
       Instead of just a force option deleting all MFD, Compacted and stale Insert In Progress segments, dividing them into 2 parameters, cleanMFDandCompacted and cleanStaleInProgress. CleanMFDAndCompacted parameter will decide if MFD and Compacted segments can be deleted and cleanStaleInProgress will decide if stale InProgress segments can be deleted. After giving these 2 parameters, force can be removed. if the user wants to do force deletion, they can set both cleanMFDandCompacted and cleanStaleInProgress to true.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress segments can be deleted

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535321147



##########
File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java
##########
@@ -90,13 +93,13 @@ public static void physicalFactAndMeasureMetadataDeletion(CarbonTable carbonTabl
    * Delete the invalid data physically from table.
    * @param carbonTable table
    * @param loadDetails Load details which need clean up
-   * @param isForceDelete is Force delete requested by user
+   * @param force Force delete Compacted and MFD segments

Review comment:
       force --> isForceDelete
   Also it will force clean the trash. add that also in header description

##########
File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java
##########
@@ -68,19 +68,22 @@ private static String getSegmentPath(AbsoluteTableIdentifier identifier,
 
   public static void physicalFactAndMeasureMetadataDeletion(CarbonTable carbonTable,
       LoadMetadataDetails[] newAddedLoadHistoryList,
-      boolean isForceDelete,
+      boolean isForceDeletion,

Review comment:
       keep it same as previous `isForceDelete`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress segments can be deleted

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535332006



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##########
@@ -1039,17 +999,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier,
     }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable,
-      AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) {
+  private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean
+      isForceDeletion, CarbonTable carbonTable, AbsoluteTableIdentifier
+      absoluteTableIdentifier, LoadMetadataDetails[] details) {
     // Delete marked loads
     boolean isUpdateRequired = DeleteLoadFolders
-        .deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, isForceDeletion, details,
-            carbonTable.getMetadataPath());
+        .deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, cleanStaleInProgress,

Review comment:
       keep the original order of arguments (identifier, isForceDeletion, cleanStaleInProgress, deatils, path), to avoid sending wrong arguments values from callers




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress segments can be deleted

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535333805



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##########
@@ -1039,17 +999,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier,
     }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable,
-      AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) {
+  private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean
+      isForceDeletion, CarbonTable carbonTable, AbsoluteTableIdentifier
+      absoluteTableIdentifier, LoadMetadataDetails[] details) {
     // Delete marked loads
     boolean isUpdateRequired = DeleteLoadFolders
-        .deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, isForceDeletion, details,
-            carbonTable.getMetadataPath());
+        .deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, cleanStaleInProgress,

Review comment:
       check for all the places and handle




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress segments can be deleted

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535334869



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##########
@@ -1039,17 +999,19 @@ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier,
     }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable carbonTable,
-      AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] details) {
+  private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, boolean
+      isForceDeletion, CarbonTable carbonTable, AbsoluteTableIdentifier
+      absoluteTableIdentifier, LoadMetadataDetails[] details) {
     // Delete marked loads
     boolean isUpdateRequired = DeleteLoadFolders
-        .deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, isForceDeletion, details,
-            carbonTable.getMetadataPath());
+        .deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, cleanStaleInProgress,

Review comment:
       it is always advisable to add a new argument at the end




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress segments can be deleted

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535348536



##########
File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java
##########
@@ -173,42 +176,73 @@ public boolean accept(CarbonFile file) {
   }
 
   private static boolean checkIfLoadCanBeDeleted(LoadMetadataDetails oneLoad,
-      boolean isForceDelete) {
-    if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() ||
-        SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() ||
-        SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() ||
-        SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS == oneLoad.getSegmentStatus())
-        && oneLoad.getVisibility().equalsIgnoreCase("true")) {
-      if (isForceDelete) {
-        return true;
-      }
-      long deletionTime = oneLoad.getModificationOrDeletionTimestamp();
-      return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil
-          .isMaxQueryTimeoutExceeded(deletionTime);
+      boolean cleanStaleInProgress, boolean isForceDeletion) {
+    if (oneLoad.getVisibility().equalsIgnoreCase("true")) {
+      return checkLoadDeletionLogic(oneLoad, isForceDeletion, cleanStaleInProgress);
     }
-
     return false;
   }
 
   private static boolean checkIfLoadCanBeDeletedPhysically(LoadMetadataDetails oneLoad,
-      boolean isForceDelete) {
+      boolean isForceDeletion, boolean cleanStaleInProgress) {
     // Check if the segment is added externally and path is set then do not delete it
-    if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus()
-        || SegmentStatus.COMPACTED == oneLoad.getSegmentStatus()) && (oneLoad.getPath() == null
-        || oneLoad.getPath().equalsIgnoreCase("NA"))) {
-      if (isForceDelete) {
-        return true;
-      }
-      long deletionTime = oneLoad.getModificationOrDeletionTimestamp();
-
-      return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil
-          .isMaxQueryTimeoutExceeded(deletionTime);
-
+    if (oneLoad.getPath() == null || oneLoad.getPath().equalsIgnoreCase("NA")) {
+      return checkLoadDeletionLogic(oneLoad,  isForceDeletion, cleanStaleInProgress);

Review comment:
       ```suggestion
         return CanDeleteThisLoad(oneLoad,  isForceDeletion, cleanStaleInProgress);
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress segments can be deleted

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535355003



##########
File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java
##########
@@ -173,42 +176,73 @@ public boolean accept(CarbonFile file) {
   }
 
   private static boolean checkIfLoadCanBeDeleted(LoadMetadataDetails oneLoad,
-      boolean isForceDelete) {
-    if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() ||
-        SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() ||
-        SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() ||
-        SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS == oneLoad.getSegmentStatus())
-        && oneLoad.getVisibility().equalsIgnoreCase("true")) {
-      if (isForceDelete) {
-        return true;
-      }
-      long deletionTime = oneLoad.getModificationOrDeletionTimestamp();
-      return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil
-          .isMaxQueryTimeoutExceeded(deletionTime);
+      boolean cleanStaleInProgress, boolean isForceDeletion) {
+    if (oneLoad.getVisibility().equalsIgnoreCase("true")) {
+      return checkLoadDeletionLogic(oneLoad, isForceDeletion, cleanStaleInProgress);
     }
-
     return false;
   }
 
   private static boolean checkIfLoadCanBeDeletedPhysically(LoadMetadataDetails oneLoad,
-      boolean isForceDelete) {
+      boolean isForceDeletion, boolean cleanStaleInProgress) {
     // Check if the segment is added externally and path is set then do not delete it
-    if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus()
-        || SegmentStatus.COMPACTED == oneLoad.getSegmentStatus()) && (oneLoad.getPath() == null
-        || oneLoad.getPath().equalsIgnoreCase("NA"))) {
-      if (isForceDelete) {
-        return true;
-      }
-      long deletionTime = oneLoad.getModificationOrDeletionTimestamp();
-
-      return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil
-          .isMaxQueryTimeoutExceeded(deletionTime);
-
+    if (oneLoad.getPath() == null || oneLoad.getPath().equalsIgnoreCase("NA")) {
+      return checkLoadDeletionLogic(oneLoad,  isForceDeletion, cleanStaleInProgress);
     }
-
     return false;
   }
 
+  private static Boolean checkLoadDeletionLogic(LoadMetadataDetails oneLoad,
+      boolean isForceDeletion, boolean cleanStaleInProgress) {
+    /*
+     * if cleanStaleInProgress == false and  force == false, clean MFD and Compacted
+     *  segments will depend on query timeout(1 hr) and trashRetentionTimeout(7 days, default).
+     *  For example:
+     *  If trashRetentionTimeout is 7 days and query timeout is 1 hr--> Delete after 7 days
+     *  If trashRetentionTimeout is 0 days and query timeout is 1 hr--> Delete after 1 hr
+     *
+     * if cleanStaleInProgress == false and  force == true, clean MFD and Compacted
+     *  segments immediately(Do not check for any timeout)
+     *
+     * if cleanStaleInProgress == true and  force == false, clean Stale Inprogress, MFD and
+     *  compacted segments after 7 days(taking carbon.trash.retention.time value)
+     *
+     * if cleanStaleInProgress == true and  force == true, clean MFD, Compacted and
+     *  stale inprogress segments immediately.(Do not check for any timeout)
+     */
+    if (isForceDeletion) {
+      // immediately delete compacted and MFD
+      if (cleanStaleInProgress) {
+        // immediately delete inprogress segments too
+        return SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() || SegmentStatus
+            .INSERT_OVERWRITE_IN_PROGRESS == oneLoad.getSegmentStatus() || SegmentStatus
+            .COMPACTED == oneLoad.getSegmentStatus() || SegmentStatus.MARKED_FOR_DELETE ==

Review comment:
       move COMPACTED and MARKED_FOR_DELETE above and return if true.
   check only INSERT_OVERWRITE_IN_PROGRESS, INSERT_IN_PROGRESS in INSERT_IN_PROGRESS




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress segments can be deleted

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535356095



##########
File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java
##########
@@ -173,42 +176,73 @@ public boolean accept(CarbonFile file) {
   }
 
   private static boolean checkIfLoadCanBeDeleted(LoadMetadataDetails oneLoad,
-      boolean isForceDelete) {
-    if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() ||
-        SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() ||
-        SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() ||
-        SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS == oneLoad.getSegmentStatus())
-        && oneLoad.getVisibility().equalsIgnoreCase("true")) {
-      if (isForceDelete) {
-        return true;
-      }
-      long deletionTime = oneLoad.getModificationOrDeletionTimestamp();
-      return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil
-          .isMaxQueryTimeoutExceeded(deletionTime);
+      boolean cleanStaleInProgress, boolean isForceDeletion) {
+    if (oneLoad.getVisibility().equalsIgnoreCase("true")) {
+      return checkLoadDeletionLogic(oneLoad, isForceDeletion, cleanStaleInProgress);
     }
-
     return false;
   }
 
   private static boolean checkIfLoadCanBeDeletedPhysically(LoadMetadataDetails oneLoad,
-      boolean isForceDelete) {
+      boolean isForceDeletion, boolean cleanStaleInProgress) {
     // Check if the segment is added externally and path is set then do not delete it
-    if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus()
-        || SegmentStatus.COMPACTED == oneLoad.getSegmentStatus()) && (oneLoad.getPath() == null
-        || oneLoad.getPath().equalsIgnoreCase("NA"))) {
-      if (isForceDelete) {
-        return true;
-      }
-      long deletionTime = oneLoad.getModificationOrDeletionTimestamp();
-
-      return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && CarbonUpdateUtil
-          .isMaxQueryTimeoutExceeded(deletionTime);
-
+    if (oneLoad.getPath() == null || oneLoad.getPath().equalsIgnoreCase("NA")) {
+      return checkLoadDeletionLogic(oneLoad,  isForceDeletion, cleanStaleInProgress);
     }
-
     return false;
   }
 
+  private static Boolean checkLoadDeletionLogic(LoadMetadataDetails oneLoad,
+      boolean isForceDeletion, boolean cleanStaleInProgress) {
+    /*
+     * if cleanStaleInProgress == false and  force == false, clean MFD and Compacted
+     *  segments will depend on query timeout(1 hr) and trashRetentionTimeout(7 days, default).
+     *  For example:
+     *  If trashRetentionTimeout is 7 days and query timeout is 1 hr--> Delete after 7 days
+     *  If trashRetentionTimeout is 0 days and query timeout is 1 hr--> Delete after 1 hr
+     *
+     * if cleanStaleInProgress == false and  force == true, clean MFD and Compacted
+     *  segments immediately(Do not check for any timeout)
+     *
+     * if cleanStaleInProgress == true and  force == false, clean Stale Inprogress, MFD and
+     *  compacted segments after 7 days(taking carbon.trash.retention.time value)
+     *
+     * if cleanStaleInProgress == true and  force == true, clean MFD, Compacted and
+     *  stale inprogress segments immediately.(Do not check for any timeout)
+     */
+    if (isForceDeletion) {
+      // immediately delete compacted and MFD
+      if (cleanStaleInProgress) {
+        // immediately delete inprogress segments too
+        return SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() || SegmentStatus
+            .INSERT_OVERWRITE_IN_PROGRESS == oneLoad.getSegmentStatus() || SegmentStatus
+            .COMPACTED == oneLoad.getSegmentStatus() || SegmentStatus.MARKED_FOR_DELETE ==
+            oneLoad.getSegmentStatus();
+      }
+      return SegmentStatus.COMPACTED ==
+          oneLoad.getSegmentStatus() || SegmentStatus.MARKED_FOR_DELETE == oneLoad
+          .getSegmentStatus();
+    }
+    long deletionTime = oneLoad.getModificationOrDeletionTimestamp();
+    // in case there is no deletion or modification timestamp, take the load start time as
+    // deleteTime
+    if (deletionTime == 0) {
+      deletionTime = oneLoad.getLoadStartTime();
+    }
+    if (cleanStaleInProgress) {
+      // delete MFD, compacted and Inprogress segments after trash timeout
+      return (SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() || SegmentStatus

Review comment:
       same as above




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress segments can be deleted

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535357362



##########
File path: docs/clean-files.md
##########
@@ -24,6 +24,7 @@ Clean files command is used to remove the Compacted, Marked For Delete ,In Progr
    ```
    CLEAN FILES FOR TABLE TABLE_NAME
    ```
+The above clean files command will clean Marked For Delete and Compacted segments depending on query time (default 1 hr) and trash retention timeout (default 7 days). It will also delete the timestamp subdirectories from the trash folder after expiration day(default 7 day, can be configured)

Review comment:
       query time,  trash retention timeout  => use the original carbon property names




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress segments can be deleted

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535358582



##########
File path: docs/clean-files.md
##########
@@ -37,10 +38,23 @@ Clean files command is used to remove the Compacted, Marked For Delete ,In Progr
    ```
   Once the timestamp subdirectory is expired as per the configured expiration day value, that subdirectory is deleted from the trash folder in the subsequent clean files command.
 
-### FORCE DELETE TRASH
-The force option with clean files command deletes all the files and folders from the trash folder.
+### FORCE OPTION
+The force option with clean files command deletes all the files and folders from the trash folder and delete the Marked for Delete and Compacted segments immediately.
 
   ```
   CLEAN FILES FOR TABLE TABLE_NAME options('force'='true')
   ```
-Since Clean Files operation with force option will delete data that can never be recovered, the force option by default is disabled. Clean files with force option is only allowed when the carbon property ```carbon.clean.file.force.allowed``` is set to true. The default value of this property is false.
\ No newline at end of file
+Since Clean Files operation with force option will delete data that can never be recovered, the force option by default is disabled. Clean files with force option is only allowed when the carbon property ```carbon.clean.file.force.allowed``` is set to true. The default value of this property is false.

Review comment:
       move this section, above force as they have to enable this before trying force




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress segments can be deleted

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535359069



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala
##########
@@ -58,8 +58,11 @@ case class CarbonCleanFilesCommand(
   var carbonTable: CarbonTable = _
   var cleanFileCommands: List[CarbonCleanFilesCommand] = List.empty
   val optionsMap = options.getOrElse(List.empty[(String, String)]).toMap
-  // forceClean will empty trash
+  // forceClean will clean the MFD and Compacted segments immediately and also empty the trash
+  // folder
   val forceClean = optionsMap.getOrElse("force", "false").toBoolean
+  // stale_inprogress will clean the In Progress segments

Review comment:
       update comment saying, it will clean based on retention time and it will clean immediately when force is true




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress segments can be deleted

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535359638



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/CleanFilesPostEventListener.scala
##########
@@ -53,13 +51,15 @@ class CleanFilesPostEventListener extends OperationEventListener with Logging {
         val carbonTable = cleanFilesPostEvent.carbonTable
         val indexTables = CarbonIndexUtil
           .getIndexCarbonTables(carbonTable, cleanFilesPostEvent.sparkSession)
+        val force = cleanFilesPostEvent.ifForceDeletion

Review comment:
       ```suggestion
           val isForceDelete = cleanFilesPostEvent.ifForceDeletion
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress segments can be deleted

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535385941



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/cleanfiles/TestCleanFileCommand.scala
##########
@@ -346,4 +346,10 @@ class TestCleanFileCommand extends QueryTest with BeforeAndAfterAll {
     sql(s"""INSERT INTO CLEANTEST SELECT "abc", 1, "name"""")
   }
 
+  def removeSegmentEntryFromTableStatusFile(carbonTable: CarbonTable, segmentNo: String) : Unit = {

Review comment:
       @vikramahuja1001 : yes, instead of defining in each file, move it to some test util.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress segments can be deleted

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#issuecomment-738128162


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5049/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress segments can be deleted

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#issuecomment-738128556


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3291/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


1234