[GitHub] [carbondata] vikramahuja1001 opened a new pull request #3917: [WIP] clean files refactor

classic Classic list List threaded Threaded
184 messages Options
12345678 ... 10
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

GitBox

vikramahuja1001 commented on a change in pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#discussion_r510659146



##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -1105,28 +1109,79 @@ public static void cleanSegments(CarbonTable table, List<PartitionSpec> partitio
    * @throws IOException
    */
   public static void deleteSegment(String tablePath, Segment segment,
-      List<PartitionSpec> partitionSpecs,
-      SegmentUpdateStatusManager updateStatusManager) throws Exception {
+      List<PartitionSpec> partitionSpecs, SegmentUpdateStatusManager updateStatusManager,
+      SegmentStatus segmentStatus, Boolean isPartitionTable, String timeStamp)
+      throws Exception {
     SegmentFileStore fileStore = new SegmentFileStore(tablePath, segment.getSegmentFileName());
     List<String> indexOrMergeFiles = fileStore.readIndexFiles(SegmentStatus.SUCCESS, true,
         FileFactory.getConfiguration());
+    List<String> filesToDelete = new ArrayList<>();
     Map<String, List<String>> indexFilesMap = fileStore.getIndexFilesMap();
     for (Map.Entry<String, List<String>> entry : indexFilesMap.entrySet()) {
-      FileFactory.deleteFile(entry.getKey());
+      // Move the file to the trash folder in case the segment status is insert in progress
+      if (segmentStatus == SegmentStatus.INSERT_IN_PROGRESS) {
+        if (!isPartitionTable) {
+          TrashUtil.moveDataToTrashFolderByFile(tablePath, entry.getKey(), timeStamp +
+              CarbonCommonConstants.FILE_SEPARATOR + CarbonCommonConstants.LOAD_FOLDER + segment
+              .getSegmentNo());
+        } else {
+          TrashUtil.moveDataToTrashFolderByFile(tablePath, entry.getKey(), timeStamp +

Review comment:
       For normal table, we do like: timestamp/Segment_#, there is no use of having Fact and Part0 folders in trash
   For partition table, we do like: timestamp/Segment_#/partition_folder, the segment number is added so as the recovery can be segment wise.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#issuecomment-714957473


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#issuecomment-715144339


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2902/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#issuecomment-715146278


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4656/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#discussion_r510707186



##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -1105,28 +1109,79 @@ public static void cleanSegments(CarbonTable table, List<PartitionSpec> partitio
    * @throws IOException
    */
   public static void deleteSegment(String tablePath, Segment segment,
-      List<PartitionSpec> partitionSpecs,
-      SegmentUpdateStatusManager updateStatusManager) throws Exception {
+      List<PartitionSpec> partitionSpecs, SegmentUpdateStatusManager updateStatusManager,
+      SegmentStatus segmentStatus, Boolean isPartitionTable, String timeStamp)
+      throws Exception {
     SegmentFileStore fileStore = new SegmentFileStore(tablePath, segment.getSegmentFileName());
     List<String> indexOrMergeFiles = fileStore.readIndexFiles(SegmentStatus.SUCCESS, true,
         FileFactory.getConfiguration());
+    List<String> filesToDelete = new ArrayList<>();
     Map<String, List<String>> indexFilesMap = fileStore.getIndexFilesMap();
     for (Map.Entry<String, List<String>> entry : indexFilesMap.entrySet()) {
-      FileFactory.deleteFile(entry.getKey());
+      // Move the file to the trash folder in case the segment status is insert in progress
+      if (segmentStatus == SegmentStatus.INSERT_IN_PROGRESS) {
+        if (!isPartitionTable) {
+          TrashUtil.copyDataToTrashFolderByFile(tablePath, entry.getKey(), timeStamp +

Review comment:
       why not copy whole segment ? why copying file by file.
   Multiple interactions to file system may become bottleneck for concurrent queries. Suggest to copy whole segment  once.
   

##########
File path: core/src/main/java/org/apache/carbondata/core/util/path/CarbonTablePath.java
##########
@@ -47,6 +47,7 @@
   public static final String BATCH_PREFIX = "_batchno";
   private static final String LOCK_DIR = "LockFiles";
 
+  public static final String SEGMENTS_FOLDER = "segments";

Review comment:
       ```suggestion
     public static final String SEGMENTS_METADATA_FOLDER = "segments";
   ```

##########
File path: core/src/main/java/org/apache/carbondata/core/util/path/TrashUtil.java
##########
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util.path;
+
+import java.io.File;
+import java.io.IOException;
+import java.sql.Timestamp;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.exception.CarbonFileException;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.commons.io.FileUtils;
+
+import org.apache.log4j.Logger;
+
+public final class TrashUtil {
+
+  private static final Logger LOGGER =
+          LogServiceFactory.getLogService(CarbonUtil.class.getName());
+
+  /**
+   * The below method copies the complete a file to the trash folder. Provide necessary
+   * timestamp and the segment number in the suffixToAdd  variable, so that the proper folder is
+   * created in the trash folder.
+   */
+  public static void copyDataToTrashFolderByFile(String carbonTablePath, String pathOfFileToCopy,
+      String suffixToAdd) {
+    String trashFolderPath = CarbonTablePath.getTrashFolder(carbonTablePath) +
+        CarbonCommonConstants.FILE_SEPARATOR + suffixToAdd;
+    try {
+      if (new File(pathOfFileToCopy).exists()) {
+        FileUtils.copyFileToDirectory(new File(pathOfFileToCopy), new File(trashFolderPath));
+        LOGGER.info("File: " + pathOfFileToCopy + " successfully copied to the trash folder: "
+                + trashFolderPath);
+      }
+    } catch (IOException e) {
+      LOGGER.error("Unable to copy " + pathOfFileToCopy + " to the trash folder", e);
+    }
+  }
+
+  /**
+   * The below method copies the complete segment folder to the trash folder. Provide necessary
+   * timestamp and the segment number in the suffixToAdd  variable, so that the proper folder is
+   * created in the trash folder.
+   */
+  public static void copyDataToTrashBySegment(CarbonFile path, String carbonTablePath,
+      String suffixToAdd) {
+    String trashFolderPath = CarbonTablePath.getTrashFolder(carbonTablePath) +
+        CarbonCommonConstants.FILE_SEPARATOR + suffixToAdd;
+    try {
+      FileUtils.copyDirectory(new File(path.getAbsolutePath()), new File(trashFolderPath));
+      LOGGER.info("Segment: " + path.getAbsolutePath() + " has been copied to the trash folder" +
+          " successfully");
+    } catch (IOException e) {
+      LOGGER.error("Unable to create the trash folder and copy data to it", e);
+    }
+  }
+
+  /**
+   * The below method deletes timestamp subdirectories in the trash folder which have expired as
+   * per the user defined expiration time
+   */
+  public static void deleteAllDataFromTrashFolderByTimeStamp(String carbonTablePath, Long timeStamp)
+          throws IOException {
+    String pathOfTrashFolder = CarbonTablePath.getTrashFolder(carbonTablePath);
+    // Deleting the timestamp based subdirectories in the trashfolder by the given timestamp.
+    if (FileFactory.isFileExist(pathOfTrashFolder)) {
+      try {
+        List<CarbonFile> carbonFileList = FileFactory.getFolderList(pathOfTrashFolder);
+        for (CarbonFile carbonFile : carbonFileList) {
+          String[] aB = carbonFile.getAbsolutePath().split(CarbonCommonConstants.FILE_SEPARATOR);
+          Long currentTime = Long.valueOf(new Timestamp(System.currentTimeMillis()).getTime());
+          Long givenTime = Long.valueOf(aB[aB.length - 1]);
+          // If the timeStamp at which the timeStamp subdirectory has expired as per the user
+          // defined value, delete the complete timeStamp subdirectory
+          if (givenTime + timeStamp < currentTime) {
+            deleteDataFromTrashFolderByFile(carbonFile);
+          }

Review comment:
       add log that nothing to delete as the files are not expired

##########
File path: core/src/main/java/org/apache/carbondata/core/util/path/TrashUtil.java
##########
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util.path;
+
+import java.io.File;
+import java.io.IOException;
+import java.sql.Timestamp;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.exception.CarbonFileException;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.commons.io.FileUtils;
+
+import org.apache.log4j.Logger;
+
+public final class TrashUtil {
+
+  private static final Logger LOGGER =
+          LogServiceFactory.getLogService(CarbonUtil.class.getName());
+
+  /**
+   * The below method copies the complete a file to the trash folder. Provide necessary
+   * timestamp and the segment number in the suffixToAdd  variable, so that the proper folder is
+   * created in the trash folder.
+   */
+  public static void copyDataToTrashFolderByFile(String carbonTablePath, String pathOfFileToCopy,
+      String suffixToAdd) {
+    String trashFolderPath = CarbonTablePath.getTrashFolder(carbonTablePath) +
+        CarbonCommonConstants.FILE_SEPARATOR + suffixToAdd;
+    try {
+      if (new File(pathOfFileToCopy).exists()) {
+        FileUtils.copyFileToDirectory(new File(pathOfFileToCopy), new File(trashFolderPath));
+        LOGGER.info("File: " + pathOfFileToCopy + " successfully copied to the trash folder: "
+                + trashFolderPath);
+      }
+    } catch (IOException e) {
+      LOGGER.error("Unable to copy " + pathOfFileToCopy + " to the trash folder", e);
+    }
+  }
+
+  /**
+   * The below method copies the complete segment folder to the trash folder. Provide necessary
+   * timestamp and the segment number in the suffixToAdd  variable, so that the proper folder is
+   * created in the trash folder.
+   */
+  public static void copyDataToTrashBySegment(CarbonFile path, String carbonTablePath,
+      String suffixToAdd) {
+    String trashFolderPath = CarbonTablePath.getTrashFolder(carbonTablePath) +
+        CarbonCommonConstants.FILE_SEPARATOR + suffixToAdd;
+    try {
+      FileUtils.copyDirectory(new File(path.getAbsolutePath()), new File(trashFolderPath));
+      LOGGER.info("Segment: " + path.getAbsolutePath() + " has been copied to the trash folder" +
+          " successfully");
+    } catch (IOException e) {
+      LOGGER.error("Unable to create the trash folder and copy data to it", e);
+    }
+  }
+
+  /**
+   * The below method deletes timestamp subdirectories in the trash folder which have expired as
+   * per the user defined expiration time
+   */
+  public static void deleteAllDataFromTrashFolderByTimeStamp(String carbonTablePath, Long timeStamp)
+          throws IOException {
+    String pathOfTrashFolder = CarbonTablePath.getTrashFolder(carbonTablePath);
+    // Deleting the timestamp based subdirectories in the trashfolder by the given timestamp.
+    if (FileFactory.isFileExist(pathOfTrashFolder)) {
+      try {
+        List<CarbonFile> carbonFileList = FileFactory.getFolderList(pathOfTrashFolder);
+        for (CarbonFile carbonFile : carbonFileList) {
+          String[] aB = carbonFile.getAbsolutePath().split(CarbonCommonConstants.FILE_SEPARATOR);

Review comment:
       getting substring is better instead of splitting it ?

##########
File path: core/src/main/java/org/apache/carbondata/core/util/path/TrashUtil.java
##########
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util.path;
+
+import java.io.File;
+import java.io.IOException;
+import java.sql.Timestamp;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.exception.CarbonFileException;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.commons.io.FileUtils;
+
+import org.apache.log4j.Logger;
+
+public final class TrashUtil {
+
+  private static final Logger LOGGER =
+          LogServiceFactory.getLogService(CarbonUtil.class.getName());
+
+  /**
+   * The below method copies the complete a file to the trash folder. Provide necessary
+   * timestamp and the segment number in the suffixToAdd  variable, so that the proper folder is
+   * created in the trash folder.
+   */
+  public static void copyDataToTrashFolderByFile(String carbonTablePath, String pathOfFileToCopy,
+      String suffixToAdd) {
+    String trashFolderPath = CarbonTablePath.getTrashFolder(carbonTablePath) +
+        CarbonCommonConstants.FILE_SEPARATOR + suffixToAdd;
+    try {
+      if (new File(pathOfFileToCopy).exists()) {
+        FileUtils.copyFileToDirectory(new File(pathOfFileToCopy), new File(trashFolderPath));
+        LOGGER.info("File: " + pathOfFileToCopy + " successfully copied to the trash folder: "
+                + trashFolderPath);
+      }
+    } catch (IOException e) {
+      LOGGER.error("Unable to copy " + pathOfFileToCopy + " to the trash folder", e);
+    }
+  }
+
+  /**
+   * The below method copies the complete segment folder to the trash folder. Provide necessary
+   * timestamp and the segment number in the suffixToAdd  variable, so that the proper folder is
+   * created in the trash folder.
+   */
+  public static void copyDataToTrashBySegment(CarbonFile path, String carbonTablePath,
+      String suffixToAdd) {
+    String trashFolderPath = CarbonTablePath.getTrashFolder(carbonTablePath) +
+        CarbonCommonConstants.FILE_SEPARATOR + suffixToAdd;
+    try {
+      FileUtils.copyDirectory(new File(path.getAbsolutePath()), new File(trashFolderPath));
+      LOGGER.info("Segment: " + path.getAbsolutePath() + " has been copied to the trash folder" +
+          " successfully");
+    } catch (IOException e) {
+      LOGGER.error("Unable to create the trash folder and copy data to it", e);
+    }
+  }
+
+  /**
+   * The below method deletes timestamp subdirectories in the trash folder which have expired as
+   * per the user defined expiration time
+   */
+  public static void deleteAllDataFromTrashFolderByTimeStamp(String carbonTablePath, Long timeStamp)
+          throws IOException {
+    String pathOfTrashFolder = CarbonTablePath.getTrashFolder(carbonTablePath);
+    // Deleting the timestamp based subdirectories in the trashfolder by the given timestamp.
+    if (FileFactory.isFileExist(pathOfTrashFolder)) {
+      try {
+        List<CarbonFile> carbonFileList = FileFactory.getFolderList(pathOfTrashFolder);
+        for (CarbonFile carbonFile : carbonFileList) {
+          String[] aB = carbonFile.getAbsolutePath().split(CarbonCommonConstants.FILE_SEPARATOR);

Review comment:
       also use better name for aB

##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -1427,6 +1428,25 @@ private CarbonCommonConstants() {
 
   public static final String BITSET_PIPE_LINE_DEFAULT = "true";
 
+  public static final long MILLIS_SECONDS_IN_A_DAY = TimeUnit.DAYS.toMillis(1);

Review comment:
       As it is used in only one place and TimeUnit.DAYS.toMillis(1) is very readable. I suggest no need to define a constant for it. Just use this directly.

##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##########
@@ -1049,7 +1049,7 @@ private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable
   }
 
   public static void deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean isForceDeletion,
-      List<PartitionSpec> partitionSpecs) throws IOException {
+      List<PartitionSpec> partitionSpecs, String timeStamp) throws IOException {

Review comment:
       while moving to trash itself (at the beginning of the function) can get current time right ? I feel no need to change all method signatures just for this.

##########
File path: docs/dml-of-carbondata.md
##########
@@ -562,3 +563,50 @@ CarbonData DML statements are documented here,which includes:
   ```
   CLEAN FILES FOR TABLE carbon_table
   ```
+
+## CLEAN FILES
+
+  Clean files command is used to remove the Compacted and Marked

Review comment:
       just give link to * [CLEAN FILES](./cleanfiles.md) here also
   

##########
File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##########
@@ -2116,6 +2086,20 @@ public int getMaxSIRepairLimit(String dbName, String tableName) {
     return Math.abs(Integer.parseInt(thresholdValue));
   }
 
+  /**
+   * The below method returns the microseconds after which the trash folder will expire
+   */
+  public long getTrashFolderExpirationTime() {
+    String configuredValue = getProperty(CarbonCommonConstants.TRASH_EXPIRATION_DAYS,
+            CarbonCommonConstants.TRASH_EXPIRATION_DAYS_DEFAULT);
+    int result = Integer.parseInt(configuredValue);
+    if (result < 0) {
+      result = Integer.parseInt(TRASH_EXPIRATION_DAYS_DEFAULT);

Review comment:
       add a warning log

##########
File path: core/src/main/java/org/apache/carbondata/core/util/path/TrashUtil.java
##########
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util.path;
+
+import java.io.File;
+import java.io.IOException;
+import java.sql.Timestamp;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.exception.CarbonFileException;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.commons.io.FileUtils;
+
+import org.apache.log4j.Logger;
+
+public final class TrashUtil {
+
+  private static final Logger LOGGER =
+          LogServiceFactory.getLogService(CarbonUtil.class.getName());
+
+  /**
+   * The below method copies the complete a file to the trash folder. Provide necessary
+   * timestamp and the segment number in the suffixToAdd  variable, so that the proper folder is
+   * created in the trash folder.
+   */
+  public static void copyDataToTrashFolderByFile(String carbonTablePath, String pathOfFileToCopy,
+      String suffixToAdd) {
+    String trashFolderPath = CarbonTablePath.getTrashFolder(carbonTablePath) +
+        CarbonCommonConstants.FILE_SEPARATOR + suffixToAdd;
+    try {
+      if (new File(pathOfFileToCopy).exists()) {
+        FileUtils.copyFileToDirectory(new File(pathOfFileToCopy), new File(trashFolderPath));
+        LOGGER.info("File: " + pathOfFileToCopy + " successfully copied to the trash folder: "
+                + trashFolderPath);
+      }
+    } catch (IOException e) {
+      LOGGER.error("Unable to copy " + pathOfFileToCopy + " to the trash folder", e);
+    }
+  }
+
+  /**
+   * The below method copies the complete segment folder to the trash folder. Provide necessary
+   * timestamp and the segment number in the suffixToAdd  variable, so that the proper folder is
+   * created in the trash folder.
+   */
+  public static void copyDataToTrashBySegment(CarbonFile path, String carbonTablePath,
+      String suffixToAdd) {
+    String trashFolderPath = CarbonTablePath.getTrashFolder(carbonTablePath) +
+        CarbonCommonConstants.FILE_SEPARATOR + suffixToAdd;
+    try {
+      FileUtils.copyDirectory(new File(path.getAbsolutePath()), new File(trashFolderPath));
+      LOGGER.info("Segment: " + path.getAbsolutePath() + " has been copied to the trash folder" +
+          " successfully");
+    } catch (IOException e) {
+      LOGGER.error("Unable to create the trash folder and copy data to it", e);
+    }
+  }
+
+  /**
+   * The below method deletes timestamp subdirectories in the trash folder which have expired as
+   * per the user defined expiration time
+   */
+  public static void deleteAllDataFromTrashFolderByTimeStamp(String carbonTablePath, Long timeStamp)
+          throws IOException {
+    String pathOfTrashFolder = CarbonTablePath.getTrashFolder(carbonTablePath);
+    // Deleting the timestamp based subdirectories in the trashfolder by the given timestamp.
+    if (FileFactory.isFileExist(pathOfTrashFolder)) {
+      try {
+        List<CarbonFile> carbonFileList = FileFactory.getFolderList(pathOfTrashFolder);
+        for (CarbonFile carbonFile : carbonFileList) {
+          String[] aB = carbonFile.getAbsolutePath().split(CarbonCommonConstants.FILE_SEPARATOR);
+          Long currentTime = Long.valueOf(new Timestamp(System.currentTimeMillis()).getTime());
+          Long givenTime = Long.valueOf(aB[aB.length - 1]);
+          // If the timeStamp at which the timeStamp subdirectory has expired as per the user
+          // defined value, delete the complete timeStamp subdirectory
+          if (givenTime + timeStamp < currentTime) {
+            deleteDataFromTrashFolderByFile(carbonFile);
+          }
+        }
+      } catch (IOException e) {
+        LOGGER.error("Error during deleting from trash folder", e);
+      }
+    }
+  }
+
+  /**
+   * The below method deletes all the files and folders in the trash folder of a carbon table.
+   */
+  public static void deleteAllDataFromTrashFolder(String carbonTablePath)
+          throws IOException {
+    String pathOfTrashFolder = CarbonTablePath.getTrashFolder(carbonTablePath);
+    // if the trash folder exists delete the contents of the trash folder, if it does not exists
+    // create a trash folder

Review comment:
       update the comments

##########
File path: core/src/main/java/org/apache/carbondata/core/util/path/TrashUtil.java
##########
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util.path;
+
+import java.io.File;
+import java.io.IOException;
+import java.sql.Timestamp;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.exception.CarbonFileException;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.commons.io.FileUtils;
+
+import org.apache.log4j.Logger;
+
+public final class TrashUtil {
+
+  private static final Logger LOGGER =
+          LogServiceFactory.getLogService(CarbonUtil.class.getName());
+
+  /**
+   * The below method copies the complete a file to the trash folder. Provide necessary
+   * timestamp and the segment number in the suffixToAdd  variable, so that the proper folder is
+   * created in the trash folder.
+   */
+  public static void copyDataToTrashFolderByFile(String carbonTablePath, String pathOfFileToCopy,
+      String suffixToAdd) {
+    String trashFolderPath = CarbonTablePath.getTrashFolder(carbonTablePath) +
+        CarbonCommonConstants.FILE_SEPARATOR + suffixToAdd;
+    try {
+      if (new File(pathOfFileToCopy).exists()) {
+        FileUtils.copyFileToDirectory(new File(pathOfFileToCopy), new File(trashFolderPath));
+        LOGGER.info("File: " + pathOfFileToCopy + " successfully copied to the trash folder: "
+                + trashFolderPath);
+      }
+    } catch (IOException e) {
+      LOGGER.error("Unable to copy " + pathOfFileToCopy + " to the trash folder", e);
+    }
+  }
+
+  /**
+   * The below method copies the complete segment folder to the trash folder. Provide necessary
+   * timestamp and the segment number in the suffixToAdd  variable, so that the proper folder is
+   * created in the trash folder.
+   */
+  public static void copyDataToTrashBySegment(CarbonFile path, String carbonTablePath,
+      String suffixToAdd) {
+    String trashFolderPath = CarbonTablePath.getTrashFolder(carbonTablePath) +
+        CarbonCommonConstants.FILE_SEPARATOR + suffixToAdd;
+    try {
+      FileUtils.copyDirectory(new File(path.getAbsolutePath()), new File(trashFolderPath));
+      LOGGER.info("Segment: " + path.getAbsolutePath() + " has been copied to the trash folder" +
+          " successfully");
+    } catch (IOException e) {
+      LOGGER.error("Unable to create the trash folder and copy data to it", e);
+    }
+  }
+
+  /**
+   * The below method deletes timestamp subdirectories in the trash folder which have expired as
+   * per the user defined expiration time
+   */
+  public static void deleteAllDataFromTrashFolderByTimeStamp(String carbonTablePath, Long timeStamp)
+          throws IOException {
+    String pathOfTrashFolder = CarbonTablePath.getTrashFolder(carbonTablePath);
+    // Deleting the timestamp based subdirectories in the trashfolder by the given timestamp.
+    if (FileFactory.isFileExist(pathOfTrashFolder)) {
+      try {
+        List<CarbonFile> carbonFileList = FileFactory.getFolderList(pathOfTrashFolder);
+        for (CarbonFile carbonFile : carbonFileList) {
+          String[] aB = carbonFile.getAbsolutePath().split(CarbonCommonConstants.FILE_SEPARATOR);
+          Long currentTime = Long.valueOf(new Timestamp(System.currentTimeMillis()).getTime());
+          Long givenTime = Long.valueOf(aB[aB.length - 1]);
+          // If the timeStamp at which the timeStamp subdirectory has expired as per the user
+          // defined value, delete the complete timeStamp subdirectory
+          if (givenTime + timeStamp < currentTime) {
+            deleteDataFromTrashFolderByFile(carbonFile);
+          }
+        }
+      } catch (IOException e) {
+        LOGGER.error("Error during deleting from trash folder", e);
+      }
+    }
+  }
+
+  /**
+   * The below method deletes all the files and folders in the trash folder of a carbon table.
+   */
+  public static void deleteAllDataFromTrashFolder(String carbonTablePath)
+          throws IOException {
+    String pathOfTrashFolder = CarbonTablePath.getTrashFolder(carbonTablePath);
+    // if the trash folder exists delete the contents of the trash folder, if it does not exists
+    // create a trash folder
+    if (FileFactory.isFileExist(pathOfTrashFolder)) {
+      try {
+        List<CarbonFile> carbonFileList = FileFactory.getFolderList(pathOfTrashFolder);
+        for (CarbonFile carbonFile : carbonFileList) {
+          deleteDataFromTrashFolderByFile(carbonFile);
+        }
+      } catch (IOException e) {
+        LOGGER.error("Error during deleting from trash folder", e);
+      }
+    }
+  }
+
+  /**
+   * The below method deletes a specific file in the trash folder.
+   */
+  private static void deleteDataFromTrashFolderByFile(CarbonFile carbonFile) {
+    try {
+      FileFactory.deleteAllCarbonFilesOfDir(carbonFile);

Review comment:
       it is not s specific file, whole folder I guess. update the comments and method header

##########
File path: docs/dml-of-carbondata.md
##########
@@ -562,3 +563,50 @@ CarbonData DML statements are documented here,which includes:
   ```
   CLEAN FILES FOR TABLE carbon_table
   ```
+
+## CLEAN FILES
+
+  Clean files command is used to remove the Compacted and Marked
+  For Delete Segments from the store. Carbondata also supports Trash
+  Folder where all the stale data is moved to after clean files
+  is called
+
+  There are several types of compaction
+
+  ```
+  CLEAN FILES ON TABLE TableName
+  ```
+
+  - **Minor Compaction**

Review comment:
       explaining what is compaction inside cleanfiles section is not good. This should b there in compaction section

##########
File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java
##########
@@ -138,8 +143,19 @@ public boolean accept(CarbonFile file) {
               if (filesToBeDeleted.length == 0) {
                 status = true;
               } else {
-
                 for (CarbonFile eachFile : filesToBeDeleted) {
+                  // If the file to be deleted is a carbondata file, index file, index merge file
+                  // or a delta file, copy that file to the trash folder.
+                  if ((eachFile.getName().endsWith(CarbonCommonConstants.FACT_FILE_EXT) ||

Review comment:
       same comment as above, copy segment at once.

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/cleanfiles/CleanFilesUtil.scala
##########
@@ -0,0 +1,409 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.cleanfiles
+
+import java.util
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable.ListBuffer
+
+import org.apache.spark.sql.{AnalysisException, CarbonEnv, Row, SparkSession}
+import org.apache.spark.sql.index.CarbonIndexUtil
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.exception.ConcurrentOperationException
+import org.apache.carbondata.core.indexstore.PartitionSpec
+import org.apache.carbondata.core.locks.{CarbonLockFactory, CarbonLockUtil, ICarbonLock, LockUsage}
+import org.apache.carbondata.core.metadata.{AbsoluteTableIdentifier, CarbonMetadata, SegmentFileStore}
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable
+import org.apache.carbondata.core.mutate.CarbonUpdateUtil
+import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, SegmentStatus, SegmentStatusManager}
+import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil}
+import org.apache.carbondata.core.util.path.{CarbonTablePath, TrashUtil}
+import org.apache.carbondata.processing.loading.TableProcessingOperations
+import org.apache.carbondata.processing.loading.model.CarbonLoadModel
+
+object CleanFilesUtil {
+  private val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName)
+
+  /**
+   * The method deletes all data if forceTableClean <true> and clean garbage segment
+   * (MARKED_FOR_DELETE state) if forceTableClean <false>
+   *
+   * @param dbName                 : Database name
+   * @param tableName              : Table name
+   * @param tablePath              : Table path
+   * @param carbonTable            : CarbonTable Object <null> in case of force clean
+   * @param forceTableClean        : <true> for force clean it will delete all data
+   *                               <false> it will clean garbage segment (MARKED_FOR_DELETE state)
+   * @param currentTablePartitions : Hive Partitions  details
+   */
+  def cleanFiles(
+    dbName: String,
+    tableName: String,
+    tablePath: String,
+    timeStamp: String,
+    carbonTable: CarbonTable,
+    forceTableClean: Boolean,
+    currentTablePartitions: Option[Seq[PartitionSpec]] = None,
+    truncateTable: Boolean = false): Unit = {
+    var carbonCleanFilesLock: ICarbonLock = null
+    val absoluteTableIdentifier = if (forceTableClean) {
+      AbsoluteTableIdentifier.from(tablePath, dbName, tableName, tableName)
+    } else {
+      carbonTable.getAbsoluteTableIdentifier
+    }
+    try {
+      val errorMsg = "Clean files request is failed for " +
+        s"$dbName.$tableName" +
+        ". Not able to acquire the clean files lock due to another clean files " +
+        "operation is running in the background."
+      // in case of force clean the lock is not required
+      if (forceTableClean) {
+        FileFactory.deleteAllCarbonFilesOfDir(
+          FileFactory.getCarbonFile(absoluteTableIdentifier.getTablePath))
+      } else {
+        carbonCleanFilesLock =
+          CarbonLockUtil
+            .getLockObject(absoluteTableIdentifier, LockUsage.CLEAN_FILES_LOCK, errorMsg)
+        if (truncateTable) {
+          SegmentStatusManager.truncateTable(carbonTable)
+        }
+        SegmentStatusManager.deleteLoadsAndUpdateMetadata(
+          carbonTable, true, currentTablePartitions.map(_.asJava).orNull, timeStamp)
+        CarbonUpdateUtil.cleanUpDeltaFiles(carbonTable, true)
+        currentTablePartitions match {
+          case Some(partitions) =>
+            SegmentFileStore.cleanSegments(
+              carbonTable,
+              currentTablePartitions.map(_.asJava).orNull,
+              timeStamp,
+              true)
+          case _ =>
+        }
+      }
+    } finally {
+      if (currentTablePartitions.equals(None)) {
+        cleanUpPartitionFoldersRecursively(carbonTable, List.empty[PartitionSpec])
+      } else {
+        cleanUpPartitionFoldersRecursively(carbonTable, currentTablePartitions.get.toList)
+      }
+
+      if (carbonCleanFilesLock != null) {
+        CarbonLockUtil.fileUnlock(carbonCleanFilesLock, LockUsage.CLEAN_FILES_LOCK)
+      }
+    }
+  }
+
+
+  /**
+   * delete partition folders recursively
+   *
+   * @param carbonTable
+   * @param partitionSpecList
+   */
+  def cleanUpPartitionFoldersRecursively(carbonTable: CarbonTable,
+      partitionSpecList: List[PartitionSpec]): Unit = {
+    if (carbonTable != null && carbonTable.isHivePartitionTable) {
+      val loadMetadataDetails = SegmentStatusManager
+        .readLoadMetadata(carbonTable.getMetadataPath)
+
+      val carbonFile = FileFactory.getCarbonFile(carbonTable.getTablePath)
+
+      // list all files from table path
+      val listOfDefaultPartFilesIterator = carbonFile.listFiles(true)
+      loadMetadataDetails.foreach { metadataDetail =>
+        if (metadataDetail.getSegmentStatus.equals(SegmentStatus.MARKED_FOR_DELETE) &&
+          metadataDetail.getSegmentFile == null) {
+          val loadStartTime: Long = metadataDetail.getLoadStartTime
+          // delete all files of @loadStartTime from table path
+          cleanCarbonFilesInFolder(listOfDefaultPartFilesIterator, loadStartTime)
+          partitionSpecList.foreach {
+            partitionSpec =>
+              val partitionLocation = partitionSpec.getLocation
+              // For partition folder outside the tablePath
+              if (!partitionLocation.toString.startsWith(carbonTable.getTablePath)) {
+                val partitionCarbonFile = FileFactory
+                  .getCarbonFile(partitionLocation.toString)
+                // list all files from partitionLocation
+                val listOfExternalPartFilesIterator = partitionCarbonFile.listFiles(true)
+                // delete all files of @loadStartTime from externalPath
+                cleanCarbonFilesInFolder(listOfExternalPartFilesIterator, loadStartTime)
+              }
+          }
+        }
+      }
+    }
+  }
+
+  /**
+   *
+   * @param carbonFiles
+   * @param timestamp
+   */
+  private def cleanCarbonFilesInFolder(carbonFiles: java.util.List[CarbonFile],
+      timestamp: Long): Unit = {
+    carbonFiles.asScala.foreach { carbonFile =>
+        val filePath = carbonFile.getPath
+        val fileName = carbonFile.getName
+        if (CarbonTablePath.DataFileUtil.compareCarbonFileTimeStamp(fileName, timestamp)) {
+          FileFactory.deleteFile(filePath)
+        }
+    }
+  }
+
+  /**
+   * The in-progress segments which are in stale state will be marked as deleted
+   * when driver is initializing.
+   *
+   * @param databaseLocation
+   * @param dbName
+   */
+  def cleanInProgressSegments(databaseLocation: String, dbName: String, timeStamp: String): Unit = {
+    val loaderDriver = CarbonProperties.getInstance().
+      getProperty(CarbonCommonConstants.DATA_MANAGEMENT_DRIVER,
+        CarbonCommonConstants.DATA_MANAGEMENT_DRIVER_DEFAULT).toBoolean
+    if (!loaderDriver) {
+      return
+    }
+    try {
+      if (FileFactory.isFileExist(databaseLocation)) {
+        val file = FileFactory.getCarbonFile(databaseLocation)
+        if (file.isDirectory) {
+          val tableFolders = file.listFiles()
+          tableFolders.foreach { tableFolder =>
+            if (tableFolder.isDirectory) {
+              val tablePath = databaseLocation + CarbonCommonConstants.FILE_SEPARATOR +
+               tableFolder.getName
+              val tableUniqueName = CarbonTable.buildUniqueName(dbName, tableFolder.getName)
+              val tableStatusFile =
+                CarbonTablePath.getTableStatusFilePath(tablePath)
+              if (FileFactory.isFileExist(tableStatusFile)) {
+                try {
+                  val carbonTable = CarbonMetadata.getInstance
+                    .getCarbonTable(tableUniqueName)
+                  SegmentStatusManager.deleteLoadsAndUpdateMetadata(carbonTable, true, null,
+                    timeStamp)
+                } catch {
+                  case _: Exception =>
+                    LOGGER.warn(s"Error while cleaning table " + s"$tableUniqueName")
+                }
+              }
+            }
+          }
+        }
+      }
+    } catch {
+      case s: java.io.FileNotFoundException =>
+        LOGGER.error(s)
+    }
+  }
+
+  /**
+   * The below method deletes all the files and folders in the trash folders of all carbon tables
+   * in all databases
+   */
+  def deleteDataFromTrashFolderInAllTables(sparkSession: SparkSession): Unit = {
+    try {
+      val databases = sparkSession.sessionState.catalog.listDatabases()
+      databases.foreach(dbName => {
+        val databaseLocation = CarbonEnv.getDatabaseLocation(dbName, sparkSession)
+        if (FileFactory.isFileExist(databaseLocation)) {
+          val file = FileFactory.getCarbonFile(databaseLocation)
+          if (file.isDirectory) {
+            val tableFolders = file.listFiles()
+            tableFolders.foreach { tableFolder =>
+              if (tableFolder.isDirectory) {
+                val tablePath = databaseLocation +
+                  CarbonCommonConstants.FILE_SEPARATOR + tableFolder.getName
+                TrashUtil.deleteAllDataFromTrashFolder(tablePath)
+              }
+            }
+          }
+        }
+      })
+    } catch {
+      case e: Throwable =>
+        // catch all exceptions to avoid failure
+        LogServiceFactory.getLogService(this.getClass.getCanonicalName)

Review comment:
       just use LOGGER




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#discussion_r510614551



##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -1427,6 +1427,25 @@ private CarbonCommonConstants() {
 
   public static final String BITSET_PIPE_LINE_DEFAULT = "true";
 
+  public static final String MICROSECONDS_IN_A_DAY = "86400000";
+
+  /**
+   * this is the user defined time(in days), when a specific timestamp subdirectory in
+   * trash folder will expire
+   */
+  @CarbonProperty
+  public static final String TRASH_EXPIRATION_TIME = "carbon.trash.expiration.time";

Review comment:
       ```suggestion
     public static final String CARBON_TRASH_EXPIRATION_TIME = "carbon.trash.expiration.time";
   ```

##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -1105,28 +1109,79 @@ public static void cleanSegments(CarbonTable table, List<PartitionSpec> partitio
    * @throws IOException
    */
   public static void deleteSegment(String tablePath, Segment segment,
-      List<PartitionSpec> partitionSpecs,
-      SegmentUpdateStatusManager updateStatusManager) throws Exception {
+      List<PartitionSpec> partitionSpecs, SegmentUpdateStatusManager updateStatusManager,
+      SegmentStatus segmentStatus, Boolean isPartitionTable, String timeStamp)

Review comment:
       please rename timeStamp, same as above comment

##########
File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java
##########
@@ -67,22 +69,23 @@ private static String getSegmentPath(AbsoluteTableIdentifier identifier,
   }
 
   public static void physicalFactAndMeasureMetadataDeletion(CarbonTable carbonTable,
-      LoadMetadataDetails[] newAddedLoadHistoryList,
-      boolean isForceDelete,
-      List<PartitionSpec> specs) {
+      LoadMetadataDetails[] newAddedLoadHistoryList, boolean isForceDelete,
+      List<PartitionSpec> specs, String timeStamp) {

Review comment:
       variable name

##########
File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##########
@@ -2116,6 +2086,20 @@ public int getMaxSIRepairLimit(String dbName, String tableName) {
     return Math.abs(Integer.parseInt(thresholdValue));
   }
 
+  /**
+   * The below method returns the microseconds after which the trash folder will expire
+   */
+  public long getTrashFolderExpirationTime() {
+    String configuredValue = getProperty(CarbonCommonConstants.TRASH_EXPIRATION_DAYS,
+            CarbonCommonConstants.TRASH_EXPIRATION_DAYS_DEFAULT);
+    int result = Integer.parseInt(configuredValue);

Review comment:
       it may throw numberFormatException

##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -1427,6 +1427,25 @@ private CarbonCommonConstants() {
 
   public static final String BITSET_PIPE_LINE_DEFAULT = "true";
 
+  public static final String MICROSECONDS_IN_A_DAY = "86400000";
+
+  /**
+   * this is the user defined time(in days), when a specific timestamp subdirectory in
+   * trash folder will expire
+   */
+  @CarbonProperty
+  public static final String TRASH_EXPIRATION_TIME = "carbon.trash.expiration.time";
+
+  /**
+   * Default expiration time of trash folder is 3 days.
+   */
+  public static final String TRASH_EXPIRATION_TIME_DEFAULT = "3";

Review comment:
       ```suggestion
     public static final String CARBON_TRASH_EXPIRATION_TIME_DEFAULT = "3";
   ```

##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -1033,7 +1034,7 @@ public static void commitDropPartitions(CarbonTable carbonTable, String uniqueId
    * @throws IOException
    */
   public static void cleanSegments(CarbonTable table, List<PartitionSpec> partitionSpecs,
-      boolean forceDelete) throws IOException {
+      String timeStamp, boolean forceDelete) throws IOException {

Review comment:
       what is this timestamp? please give a meaningful variable name

##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -1427,6 +1428,25 @@ private CarbonCommonConstants() {
 
   public static final String BITSET_PIPE_LINE_DEFAULT = "true";
 
+  public static final long MILLIS_SECONDS_IN_A_DAY = TimeUnit.DAYS.toMillis(1);
+
+  /**
+   * this is the user defined time(in days), when a specific timestamp subdirectory in
+   * trash folder will expire
+   */
+  @CarbonProperty
+  public static final String TRASH_EXPIRATION_DAYS = "carbon.trash.expiration.days";
+
+  /**
+   * Default expiration time of trash folder is 3 days.
+   */
+  public static final String TRASH_EXPIRATION_DAYS_DEFAULT = "3";

Review comment:
       ```suggestion
     public static final String CARBON_TRASH_EXPIRATION_DAYS_DEFAULT = "3";
   ```

##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##########
@@ -1049,7 +1049,7 @@ private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable
   }
 
   public static void deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean isForceDeletion,
-      List<PartitionSpec> partitionSpecs) throws IOException {
+      List<PartitionSpec> partitionSpecs, String timeStamp) throws IOException {

Review comment:
       please rename everywhere

##########
File path: core/src/main/java/org/apache/carbondata/core/util/path/CarbonTablePath.java
##########
@@ -792,4 +793,9 @@ public static String getParentPath(String dataFilePath) {
       return dataFilePath;
     }
   }
+
+  public static String getTrashFolder(String carbonTablePath) {

Review comment:
       ```suggestion
     public static String getTrashFolderPath(String carbonTablePath) {
   ```

##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -1427,6 +1428,25 @@ private CarbonCommonConstants() {
 
   public static final String BITSET_PIPE_LINE_DEFAULT = "true";
 
+  public static final long MILLIS_SECONDS_IN_A_DAY = TimeUnit.DAYS.toMillis(1);
+
+  /**
+   * this is the user defined time(in days), when a specific timestamp subdirectory in
+   * trash folder will expire
+   */
+  @CarbonProperty
+  public static final String TRASH_EXPIRATION_DAYS = "carbon.trash.expiration.days";

Review comment:
       ```suggestion
     public static final String CARBON_TRASH_EXPIRATION_DAYS = "carbon.trash.expiration.days";
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#discussion_r512545772



##########
File path: docs/cleanfiles.md
##########
@@ -0,0 +1,78 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+
+## CLEAN FILES
+
+Clean files command is used to remove the Compacted, Marked For Delete ,In Progress which are stale and Partial(Segments which are missing from the table status file but their data is present)
+ segments from the store.
+
+ Clean Files Command
+   ```
+   CLEAN FILES ON TABLE TABLE_NAME

Review comment:
       changed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#discussion_r512548768



##########
File path: docs/cleanfiles.md
##########
@@ -0,0 +1,78 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+
+## CLEAN FILES
+
+Clean files command is used to remove the Compacted, Marked For Delete ,In Progress which are stale and Partial(Segments which are missing from the table status file but their data is present)
+ segments from the store.
+
+ Clean Files Command
+   ```
+   CLEAN FILES ON TABLE TABLE_NAME
+   ```
+
+
+### TRASH FOLDER
+
+  Carbondata supports a Trash Folder which is used as a redundant folder where all the unnecessary files and folders are moved to during clean files operation.
+  This trash folder is mantained inside the table path. It is a hidden folder(.Trash). The segments that are moved to the trash folder are mantained under a timestamp
+  subfolder(timestamp at which clean files operation is called). This helps the user to list down segments by timestamp.  By default all the timestamp sub-directory have an expiration
+  time of (3 days since that timestamp) and it can be configured by the user using the following carbon property
+   ```
+   carbon.trash.expiration.time = "Number of days"
+   ```
+  Once the timestamp subdirectory is expired as per the configured expiration day value, the subdirectory is deleted from the trash folder in the subsequent clean files command.
+  
+
+
+
+### DRY RUN
+  Support for dry run is provided before the actual clean files operation. This dry run operation will list down all the segments which are going to be manipulated during
+  the clean files operation. The dry run result will show the current location of the segment(it can be in FACT folder, Partition folder or trash folder) and where that segment
+  will be moved(to the trash folder or deleted from store) once the actual operation will be called.
+  
+
+  ```
+  CLEAN FILES ON TABLE TABLE_NAME options('dry_run'='true')
+  ```
+
+### FORCE DELETE TRASH
+The force option with clean files command deletes all the files and folders from the trash folder.
+
+  ```
+  CLEAN FILES ON TABLE TABLE_NAME options('force'='true')
+  ```
+
+### DATA RECOVERY FROM THE TRASH FOLDER
+
+The segments from can be recovered from the trash folder by creating an external table from the desired segment location

Review comment:
       changed

##########
File path: docs/dml-of-carbondata.md
##########
@@ -552,3 +553,50 @@ CarbonData DML statements are documented here,which includes:
   ```
   CLEAN FILES FOR TABLE carbon_table
   ```
+
+## CLEAN FILES
+
+  Clean files command is used to remove the Compacted and Marked
+  For Delete Segments from the store. Carbondata also supports Trash
+  Folder where all the stale data is moved to after clean files
+  is called
+
+  There are several types of compaction
+
+  ```
+  CLEAN FILES ON TABLE TableName
+  ```

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#discussion_r512549332



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/cleanfiles/TestCleanFileCommand.scala
##########
@@ -0,0 +1,484 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.carbondata.spark.testsuite.cleanfiles
+
+import java.io.{File, PrintWriter}
+import java.util
+import java.util.List
+
+import org.apache.carbondata.cleanfiles.CleanFilesUtil
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.util.CarbonUtil
+import org.apache.spark.sql.{CarbonEnv, Row}
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import scala.io.Source
+
+class TestCleanFileCommand extends QueryTest with BeforeAndAfterAll {
+
+  var count = 0
+
+  test("clean up table and test trash folder with In Progress segments") {
+    sql("""DROP TABLE IF EXISTS CLEANTEST""")
+    sql("""DROP TABLE IF EXISTS CLEANTEST1""")
+    sql(
+      """
+        | CREATE TABLE cleantest (name String, id Int)
+        | STORED AS carbondata
+      """.stripMargin)
+    sql(s"""INSERT INTO CLEANTEST SELECT "abc", 1""")
+    sql(s"""INSERT INTO CLEANTEST SELECT "abc", 1""")
+    sql(s"""INSERT INTO CLEANTEST SELECT "abc", 1""")
+    // run a select query before deletion
+    checkAnswer(sql(s"""select count(*) from cleantest"""),
+      Seq(Row(3)))
+
+    val path = CarbonEnv.getCarbonTable(Some("default"), "cleantest")(sqlContext.sparkSession)
+      .getTablePath
+    val tableStatusFilePath = path + CarbonCommonConstants.FILE_SEPARATOR + "Metadata" +
+      CarbonCommonConstants.FILE_SEPARATOR + "tableStatus"
+    editTableStatusFile(path)
+    val trashFolderPath = path + CarbonCommonConstants.FILE_SEPARATOR +
+      CarbonCommonConstants.CARBON_TRASH_FOLDER_NAME
+
+    assert(!FileFactory.isFileExist(trashFolderPath))
+    val dryRun = sql(s"CLEAN FILES FOR TABLE cleantest OPTIONS('isDryRun'='true')").count()
+    // dry run shows 3 segments to move to trash
+    assert(dryRun == 3)
+
+    sql(s"CLEAN FILES FOR TABLE cleantest").show
+
+    checkAnswer(sql(s"""select count(*) from cleantest"""),
+      Seq(Row(0)))
+    assert(FileFactory.isFileExist(trashFolderPath))
+    var list = getFileCountInTrashFolder(trashFolderPath)
+    assert(list == 6)
+
+    val dryRun1 = sql(s"CLEAN FILES FOR TABLE cleantest OPTIONS('isDryRun'='true')").count()
+    sql(s"CLEAN FILES FOR TABLE cleantest").show
+
+    count = 0
+    list = getFileCountInTrashFolder(trashFolderPath)
+    // no carbondata file is added to the trash
+    assert(list == 6)
+
+
+    val timeStamp = getTimestampFolderName(trashFolderPath)
+
+    // recovering data from trash folder
+    sql(
+      """
+        | CREATE TABLE cleantest1 (name String, id Int)
+        | STORED AS carbondata
+      """.stripMargin)
+
+    val segment0Path = trashFolderPath + CarbonCommonConstants.FILE_SEPARATOR + timeStamp +
+      CarbonCommonConstants.FILE_SEPARATOR + CarbonCommonConstants.LOAD_FOLDER + '0'
+    val segment1Path = trashFolderPath + CarbonCommonConstants.FILE_SEPARATOR + timeStamp +
+      CarbonCommonConstants.FILE_SEPARATOR + CarbonCommonConstants.LOAD_FOLDER + '1'
+    val segment2Path = trashFolderPath + CarbonCommonConstants.FILE_SEPARATOR + timeStamp +
+      CarbonCommonConstants.FILE_SEPARATOR + CarbonCommonConstants.LOAD_FOLDER + '2'
+
+    sql(s"alter table cleantest1 add segment options('path'='$segment0Path'," +
+      s"'format'='carbon')").show()
+    sql(s"alter table cleantest1 add segment options('path'='$segment1Path'," +
+      s"'format'='carbon')").show()
+    sql(s"alter table cleantest1 add segment options('path'='$segment2Path'," +
+      s"'format'='carbon')").show()
+    sql(s"""INSERT INTO CLEANTEST SELECT * from cleantest1""")
+
+    // test after recovering data from trash
+    checkAnswer(sql(s"""select count(*) from cleantest"""),
+      Seq(Row(3)))
+
+    sql(s"CLEAN FILES FOR TABLE cleantest options('force'='true')").show
+    count = 0
+    list = getFileCountInTrashFolder(trashFolderPath)
+    // no carbondata file is added to the trash
+    assert(list == 0)
+    sql("""DROP TABLE IF EXISTS CLEANTEST""")
+    sql("""DROP TABLE IF EXISTS CLEANTEST1""")
+  }
+
+
+  test("clean up maintable table and test trash folder with SI with IN PROGRESS segments") {
+
+    sql("""DROP TABLE IF EXISTS CLEANTEST_WITHSI""")
+    sql("""DROP TABLE IF EXISTS CLEANTEST1""")
+    sql(
+      """
+        | CREATE TABLE CLEANTEST_WITHSI (id Int, name String, add String )
+        | STORED AS carbondata
+      """.stripMargin)
+    sql(s"""INSERT INTO CLEANTEST_WITHSI SELECT 1,"abc","def"""")
+    sql(s"""INSERT INTO CLEANTEST_WITHSI SELECT 2, "abc","def"""")
+    sql(s"""INSERT INTO CLEANTEST_WITHSI SELECT 3, "abc","def"""")
+
+    sql(s"""CREATE INDEX SI_CLEANTEST on cleantest_withSI(add) as 'carbondata' """)
+
+    checkAnswer(sql(s"""select count(*) from cleantest_withSI"""),
+      Seq(Row(3)))
+    checkAnswer(sql(s"""select count(*) from si_cleantest"""),
+      Seq(Row(3)))
+
+    val mainTablePath = CarbonEnv.getCarbonTable(Some("default"), "cleantest_withsi")(sqlContext
+      .sparkSession).getTablePath
+    editTableStatusFile(mainTablePath)
+    val mainTableTrashFolderPath = mainTablePath + CarbonCommonConstants.FILE_SEPARATOR +
+      CarbonCommonConstants.CARBON_TRASH_FOLDER_NAME
+
+    val siTablePath = CarbonEnv.getCarbonTable(Some("default"), "si_cleantest")(sqlContext
+      .sparkSession).getTablePath
+    editTableStatusFile(siTablePath)
+    val siTableTrashFolderPath = siTablePath + CarbonCommonConstants.FILE_SEPARATOR +
+      CarbonCommonConstants.CARBON_TRASH_FOLDER_NAME
+
+    assert(!FileFactory.isFileExist(mainTableTrashFolderPath))
+    assert(!FileFactory.isFileExist(siTableTrashFolderPath))
+
+    val dryRun = sql(s"CLEAN FILES FOR TABLE cleantest_withsi OPTIONS('isDryRun'='true')").count()
+    // dry run shows 6 segments to move to trash. 3 for main table, 3 for si table
+    assert(dryRun == 6)
+
+    sql(s"CLEAN FILES FOR TABLE CLEANTEST_WITHSI").show()
+

Review comment:
       if i drop index then why will i run clean files on SI table?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#discussion_r512550325



##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -1105,28 +1109,79 @@ public static void cleanSegments(CarbonTable table, List<PartitionSpec> partitio
    * @throws IOException
    */
   public static void deleteSegment(String tablePath, Segment segment,
-      List<PartitionSpec> partitionSpecs,
-      SegmentUpdateStatusManager updateStatusManager) throws Exception {
+      List<PartitionSpec> partitionSpecs, SegmentUpdateStatusManager updateStatusManager,
+      SegmentStatus segmentStatus, Boolean isPartitionTable, String timeStamp)
+      throws Exception {
     SegmentFileStore fileStore = new SegmentFileStore(tablePath, segment.getSegmentFileName());
     List<String> indexOrMergeFiles = fileStore.readIndexFiles(SegmentStatus.SUCCESS, true,
         FileFactory.getConfiguration());
+    List<String> filesToDelete = new ArrayList<>();
     Map<String, List<String>> indexFilesMap = fileStore.getIndexFilesMap();
     for (Map.Entry<String, List<String>> entry : indexFilesMap.entrySet()) {
-      FileFactory.deleteFile(entry.getKey());
+      // Move the file to the trash folder in case the segment status is insert in progress
+      if (segmentStatus == SegmentStatus.INSERT_IN_PROGRESS) {

Review comment:
       for the normal table flow, i have changed it to copy to trash by segment, but in case of partition table copying to trash by file because i will have to read the segment file to get the desired carbondata and the index files per segment, which will increase the IO time.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#discussion_r512550744



##########
File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java
##########
@@ -138,8 +143,19 @@ public boolean accept(CarbonFile file) {
               if (filesToBeDeleted.length == 0) {
                 status = true;
               } else {
-
                 for (CarbonFile eachFile : filesToBeDeleted) {
+                  // If the file to be deleted is a carbondata file, index file, index merge file
+                  // or a delta file, copy that file to the trash folder.
+                  if ((eachFile.getName().endsWith(CarbonCommonConstants.FACT_FILE_EXT) ||

Review comment:
       coping segment wise in the case of normal table, in the case of partition flow have kept it file by file.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#discussion_r512551259



##########
File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java
##########
@@ -192,11 +208,17 @@ private static boolean checkIfLoadCanBeDeleted(LoadMetadataDetails oneLoad,
   }
 
   private static boolean checkIfLoadCanBeDeletedPhysically(LoadMetadataDetails oneLoad,
-      boolean isForceDelete) {
+      boolean isForceDelete, AbsoluteTableIdentifier absoluteTableIdentifier) {
     // Check if the segment is added externally and path is set then do not delete it
     if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus()
-        || SegmentStatus.COMPACTED == oneLoad.getSegmentStatus()) && (oneLoad.getPath() == null
+        || SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() || SegmentStatus
+        .INSERT_IN_PROGRESS == oneLoad.getSegmentStatus()) && (oneLoad.getPath() == null

Review comment:
       i am not sure about this. maybe we can discuss with  @ajantha-bhat or @akashrn5 once?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#discussion_r512551385



##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -1427,6 +1428,25 @@ private CarbonCommonConstants() {
 
   public static final String BITSET_PIPE_LINE_DEFAULT = "true";
 
+  public static final long MILLIS_SECONDS_IN_A_DAY = TimeUnit.DAYS.toMillis(1);

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#discussion_r512551780



##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -1105,28 +1109,79 @@ public static void cleanSegments(CarbonTable table, List<PartitionSpec> partitio
    * @throws IOException
    */
   public static void deleteSegment(String tablePath, Segment segment,
-      List<PartitionSpec> partitionSpecs,
-      SegmentUpdateStatusManager updateStatusManager) throws Exception {
+      List<PartitionSpec> partitionSpecs, SegmentUpdateStatusManager updateStatusManager,
+      SegmentStatus segmentStatus, Boolean isPartitionTable, String timeStamp)
+      throws Exception {
     SegmentFileStore fileStore = new SegmentFileStore(tablePath, segment.getSegmentFileName());
     List<String> indexOrMergeFiles = fileStore.readIndexFiles(SegmentStatus.SUCCESS, true,
         FileFactory.getConfiguration());
+    List<String> filesToDelete = new ArrayList<>();
     Map<String, List<String>> indexFilesMap = fileStore.getIndexFilesMap();
     for (Map.Entry<String, List<String>> entry : indexFilesMap.entrySet()) {
-      FileFactory.deleteFile(entry.getKey());
+      // Move the file to the trash folder in case the segment status is insert in progress
+      if (segmentStatus == SegmentStatus.INSERT_IN_PROGRESS) {
+        if (!isPartitionTable) {
+          TrashUtil.copyDataToTrashFolderByFile(tablePath, entry.getKey(), timeStamp +

Review comment:
       copying it whole segment wise for normal tables, but in case of partition table, doing it file level.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#discussion_r512552552



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##########
@@ -1049,7 +1049,7 @@ private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable
   }
 
   public static void deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean isForceDeletion,
-      List<PartitionSpec> partitionSpecs) throws IOException {
+      List<PartitionSpec> partitionSpecs, String timeStamp) throws IOException {

Review comment:
       i have changed this behaviour, after this change even one clean files command can create multiple timestamp subdirectories. The user can use tree command to list the files and use the timestamp subfolder as he desires.

##########
File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##########
@@ -2116,6 +2086,20 @@ public int getMaxSIRepairLimit(String dbName, String tableName) {
     return Math.abs(Integer.parseInt(thresholdValue));
   }
 
+  /**
+   * The below method returns the microseconds after which the trash folder will expire
+   */
+  public long getTrashFolderExpirationTime() {
+    String configuredValue = getProperty(CarbonCommonConstants.TRASH_EXPIRATION_DAYS,
+            CarbonCommonConstants.TRASH_EXPIRATION_DAYS_DEFAULT);
+    int result = Integer.parseInt(configuredValue);
+    if (result < 0) {
+      result = Integer.parseInt(TRASH_EXPIRATION_DAYS_DEFAULT);

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#discussion_r512552754



##########
File path: core/src/main/java/org/apache/carbondata/core/util/path/CarbonTablePath.java
##########
@@ -47,6 +47,7 @@
   public static final String BATCH_PREFIX = "_batchno";
   private static final String LOCK_DIR = "LockFiles";
 
+  public static final String SEGMENTS_FOLDER = "segments";

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#discussion_r512553372



##########
File path: core/src/main/java/org/apache/carbondata/core/util/path/TrashUtil.java
##########
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util.path;
+
+import java.io.File;
+import java.io.IOException;
+import java.sql.Timestamp;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.exception.CarbonFileException;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.commons.io.FileUtils;
+
+import org.apache.log4j.Logger;
+
+public final class TrashUtil {
+
+  private static final Logger LOGGER =
+          LogServiceFactory.getLogService(CarbonUtil.class.getName());
+
+  /**
+   * The below method copies the complete a file to the trash folder. Provide necessary
+   * timestamp and the segment number in the suffixToAdd  variable, so that the proper folder is
+   * created in the trash folder.
+   */
+  public static void copyDataToTrashFolderByFile(String carbonTablePath, String pathOfFileToCopy,
+      String suffixToAdd) {
+    String trashFolderPath = CarbonTablePath.getTrashFolder(carbonTablePath) +
+        CarbonCommonConstants.FILE_SEPARATOR + suffixToAdd;
+    try {
+      if (new File(pathOfFileToCopy).exists()) {
+        FileUtils.copyFileToDirectory(new File(pathOfFileToCopy), new File(trashFolderPath));
+        LOGGER.info("File: " + pathOfFileToCopy + " successfully copied to the trash folder: "
+                + trashFolderPath);
+      }
+    } catch (IOException e) {
+      LOGGER.error("Unable to copy " + pathOfFileToCopy + " to the trash folder", e);
+    }
+  }
+
+  /**
+   * The below method copies the complete segment folder to the trash folder. Provide necessary
+   * timestamp and the segment number in the suffixToAdd  variable, so that the proper folder is
+   * created in the trash folder.
+   */
+  public static void copyDataToTrashBySegment(CarbonFile path, String carbonTablePath,
+      String suffixToAdd) {
+    String trashFolderPath = CarbonTablePath.getTrashFolder(carbonTablePath) +
+        CarbonCommonConstants.FILE_SEPARATOR + suffixToAdd;
+    try {
+      FileUtils.copyDirectory(new File(path.getAbsolutePath()), new File(trashFolderPath));
+      LOGGER.info("Segment: " + path.getAbsolutePath() + " has been copied to the trash folder" +
+          " successfully");
+    } catch (IOException e) {
+      LOGGER.error("Unable to create the trash folder and copy data to it", e);
+    }
+  }
+
+  /**
+   * The below method deletes timestamp subdirectories in the trash folder which have expired as
+   * per the user defined expiration time
+   */
+  public static void deleteAllDataFromTrashFolderByTimeStamp(String carbonTablePath, Long timeStamp)
+          throws IOException {
+    String pathOfTrashFolder = CarbonTablePath.getTrashFolder(carbonTablePath);
+    // Deleting the timestamp based subdirectories in the trashfolder by the given timestamp.
+    if (FileFactory.isFileExist(pathOfTrashFolder)) {
+      try {
+        List<CarbonFile> carbonFileList = FileFactory.getFolderList(pathOfTrashFolder);
+        for (CarbonFile carbonFile : carbonFileList) {
+          String[] aB = carbonFile.getAbsolutePath().split(CarbonCommonConstants.FILE_SEPARATOR);

Review comment:
       different names for partition tables and normal tables, changed the variable name though

##########
File path: core/src/main/java/org/apache/carbondata/core/util/path/TrashUtil.java
##########
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util.path;
+
+import java.io.File;
+import java.io.IOException;
+import java.sql.Timestamp;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.exception.CarbonFileException;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.commons.io.FileUtils;
+
+import org.apache.log4j.Logger;
+
+public final class TrashUtil {
+
+  private static final Logger LOGGER =
+          LogServiceFactory.getLogService(CarbonUtil.class.getName());
+
+  /**
+   * The below method copies the complete a file to the trash folder. Provide necessary
+   * timestamp and the segment number in the suffixToAdd  variable, so that the proper folder is
+   * created in the trash folder.
+   */
+  public static void copyDataToTrashFolderByFile(String carbonTablePath, String pathOfFileToCopy,
+      String suffixToAdd) {
+    String trashFolderPath = CarbonTablePath.getTrashFolder(carbonTablePath) +
+        CarbonCommonConstants.FILE_SEPARATOR + suffixToAdd;
+    try {
+      if (new File(pathOfFileToCopy).exists()) {
+        FileUtils.copyFileToDirectory(new File(pathOfFileToCopy), new File(trashFolderPath));
+        LOGGER.info("File: " + pathOfFileToCopy + " successfully copied to the trash folder: "
+                + trashFolderPath);
+      }
+    } catch (IOException e) {
+      LOGGER.error("Unable to copy " + pathOfFileToCopy + " to the trash folder", e);
+    }
+  }
+
+  /**
+   * The below method copies the complete segment folder to the trash folder. Provide necessary
+   * timestamp and the segment number in the suffixToAdd  variable, so that the proper folder is
+   * created in the trash folder.
+   */
+  public static void copyDataToTrashBySegment(CarbonFile path, String carbonTablePath,
+      String suffixToAdd) {
+    String trashFolderPath = CarbonTablePath.getTrashFolder(carbonTablePath) +
+        CarbonCommonConstants.FILE_SEPARATOR + suffixToAdd;
+    try {
+      FileUtils.copyDirectory(new File(path.getAbsolutePath()), new File(trashFolderPath));
+      LOGGER.info("Segment: " + path.getAbsolutePath() + " has been copied to the trash folder" +
+          " successfully");
+    } catch (IOException e) {
+      LOGGER.error("Unable to create the trash folder and copy data to it", e);
+    }
+  }
+
+  /**
+   * The below method deletes timestamp subdirectories in the trash folder which have expired as
+   * per the user defined expiration time
+   */
+  public static void deleteAllDataFromTrashFolderByTimeStamp(String carbonTablePath, Long timeStamp)
+          throws IOException {
+    String pathOfTrashFolder = CarbonTablePath.getTrashFolder(carbonTablePath);
+    // Deleting the timestamp based subdirectories in the trashfolder by the given timestamp.
+    if (FileFactory.isFileExist(pathOfTrashFolder)) {
+      try {
+        List<CarbonFile> carbonFileList = FileFactory.getFolderList(pathOfTrashFolder);
+        for (CarbonFile carbonFile : carbonFileList) {
+          String[] aB = carbonFile.getAbsolutePath().split(CarbonCommonConstants.FILE_SEPARATOR);
+          Long currentTime = Long.valueOf(new Timestamp(System.currentTimeMillis()).getTime());
+          Long givenTime = Long.valueOf(aB[aB.length - 1]);
+          // If the timeStamp at which the timeStamp subdirectory has expired as per the user
+          // defined value, delete the complete timeStamp subdirectory
+          if (givenTime + timeStamp < currentTime) {
+            deleteDataFromTrashFolderByFile(carbonFile);
+          }

Review comment:
       done

##########
File path: core/src/main/java/org/apache/carbondata/core/util/path/TrashUtil.java
##########
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util.path;
+
+import java.io.File;
+import java.io.IOException;
+import java.sql.Timestamp;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.exception.CarbonFileException;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.commons.io.FileUtils;
+
+import org.apache.log4j.Logger;
+
+public final class TrashUtil {
+
+  private static final Logger LOGGER =
+          LogServiceFactory.getLogService(CarbonUtil.class.getName());
+
+  /**
+   * The below method copies the complete a file to the trash folder. Provide necessary
+   * timestamp and the segment number in the suffixToAdd  variable, so that the proper folder is
+   * created in the trash folder.
+   */
+  public static void copyDataToTrashFolderByFile(String carbonTablePath, String pathOfFileToCopy,
+      String suffixToAdd) {
+    String trashFolderPath = CarbonTablePath.getTrashFolder(carbonTablePath) +
+        CarbonCommonConstants.FILE_SEPARATOR + suffixToAdd;
+    try {
+      if (new File(pathOfFileToCopy).exists()) {
+        FileUtils.copyFileToDirectory(new File(pathOfFileToCopy), new File(trashFolderPath));
+        LOGGER.info("File: " + pathOfFileToCopy + " successfully copied to the trash folder: "
+                + trashFolderPath);
+      }
+    } catch (IOException e) {
+      LOGGER.error("Unable to copy " + pathOfFileToCopy + " to the trash folder", e);
+    }
+  }
+
+  /**
+   * The below method copies the complete segment folder to the trash folder. Provide necessary
+   * timestamp and the segment number in the suffixToAdd  variable, so that the proper folder is
+   * created in the trash folder.
+   */
+  public static void copyDataToTrashBySegment(CarbonFile path, String carbonTablePath,
+      String suffixToAdd) {
+    String trashFolderPath = CarbonTablePath.getTrashFolder(carbonTablePath) +
+        CarbonCommonConstants.FILE_SEPARATOR + suffixToAdd;
+    try {
+      FileUtils.copyDirectory(new File(path.getAbsolutePath()), new File(trashFolderPath));
+      LOGGER.info("Segment: " + path.getAbsolutePath() + " has been copied to the trash folder" +
+          " successfully");
+    } catch (IOException e) {
+      LOGGER.error("Unable to create the trash folder and copy data to it", e);
+    }
+  }
+
+  /**
+   * The below method deletes timestamp subdirectories in the trash folder which have expired as
+   * per the user defined expiration time
+   */
+  public static void deleteAllDataFromTrashFolderByTimeStamp(String carbonTablePath, Long timeStamp)
+          throws IOException {
+    String pathOfTrashFolder = CarbonTablePath.getTrashFolder(carbonTablePath);
+    // Deleting the timestamp based subdirectories in the trashfolder by the given timestamp.
+    if (FileFactory.isFileExist(pathOfTrashFolder)) {
+      try {
+        List<CarbonFile> carbonFileList = FileFactory.getFolderList(pathOfTrashFolder);
+        for (CarbonFile carbonFile : carbonFileList) {
+          String[] aB = carbonFile.getAbsolutePath().split(CarbonCommonConstants.FILE_SEPARATOR);
+          Long currentTime = Long.valueOf(new Timestamp(System.currentTimeMillis()).getTime());
+          Long givenTime = Long.valueOf(aB[aB.length - 1]);
+          // If the timeStamp at which the timeStamp subdirectory has expired as per the user
+          // defined value, delete the complete timeStamp subdirectory
+          if (givenTime + timeStamp < currentTime) {
+            deleteDataFromTrashFolderByFile(carbonFile);
+          }
+        }
+      } catch (IOException e) {
+        LOGGER.error("Error during deleting from trash folder", e);
+      }
+    }
+  }
+
+  /**
+   * The below method deletes all the files and folders in the trash folder of a carbon table.
+   */
+  public static void deleteAllDataFromTrashFolder(String carbonTablePath)
+          throws IOException {
+    String pathOfTrashFolder = CarbonTablePath.getTrashFolder(carbonTablePath);
+    // if the trash folder exists delete the contents of the trash folder, if it does not exists
+    // create a trash folder

Review comment:
       done

##########
File path: core/src/main/java/org/apache/carbondata/core/util/path/TrashUtil.java
##########
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util.path;
+
+import java.io.File;
+import java.io.IOException;
+import java.sql.Timestamp;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.exception.CarbonFileException;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.commons.io.FileUtils;
+
+import org.apache.log4j.Logger;
+
+public final class TrashUtil {
+
+  private static final Logger LOGGER =
+          LogServiceFactory.getLogService(CarbonUtil.class.getName());
+
+  /**
+   * The below method copies the complete a file to the trash folder. Provide necessary
+   * timestamp and the segment number in the suffixToAdd  variable, so that the proper folder is
+   * created in the trash folder.
+   */
+  public static void copyDataToTrashFolderByFile(String carbonTablePath, String pathOfFileToCopy,
+      String suffixToAdd) {
+    String trashFolderPath = CarbonTablePath.getTrashFolder(carbonTablePath) +
+        CarbonCommonConstants.FILE_SEPARATOR + suffixToAdd;
+    try {
+      if (new File(pathOfFileToCopy).exists()) {
+        FileUtils.copyFileToDirectory(new File(pathOfFileToCopy), new File(trashFolderPath));
+        LOGGER.info("File: " + pathOfFileToCopy + " successfully copied to the trash folder: "
+                + trashFolderPath);
+      }
+    } catch (IOException e) {
+      LOGGER.error("Unable to copy " + pathOfFileToCopy + " to the trash folder", e);
+    }
+  }
+
+  /**
+   * The below method copies the complete segment folder to the trash folder. Provide necessary
+   * timestamp and the segment number in the suffixToAdd  variable, so that the proper folder is
+   * created in the trash folder.
+   */
+  public static void copyDataToTrashBySegment(CarbonFile path, String carbonTablePath,
+      String suffixToAdd) {
+    String trashFolderPath = CarbonTablePath.getTrashFolder(carbonTablePath) +
+        CarbonCommonConstants.FILE_SEPARATOR + suffixToAdd;
+    try {
+      FileUtils.copyDirectory(new File(path.getAbsolutePath()), new File(trashFolderPath));
+      LOGGER.info("Segment: " + path.getAbsolutePath() + " has been copied to the trash folder" +
+          " successfully");
+    } catch (IOException e) {
+      LOGGER.error("Unable to create the trash folder and copy data to it", e);
+    }
+  }
+
+  /**
+   * The below method deletes timestamp subdirectories in the trash folder which have expired as
+   * per the user defined expiration time
+   */
+  public static void deleteAllDataFromTrashFolderByTimeStamp(String carbonTablePath, Long timeStamp)
+          throws IOException {
+    String pathOfTrashFolder = CarbonTablePath.getTrashFolder(carbonTablePath);
+    // Deleting the timestamp based subdirectories in the trashfolder by the given timestamp.
+    if (FileFactory.isFileExist(pathOfTrashFolder)) {
+      try {
+        List<CarbonFile> carbonFileList = FileFactory.getFolderList(pathOfTrashFolder);
+        for (CarbonFile carbonFile : carbonFileList) {
+          String[] aB = carbonFile.getAbsolutePath().split(CarbonCommonConstants.FILE_SEPARATOR);
+          Long currentTime = Long.valueOf(new Timestamp(System.currentTimeMillis()).getTime());
+          Long givenTime = Long.valueOf(aB[aB.length - 1]);
+          // If the timeStamp at which the timeStamp subdirectory has expired as per the user
+          // defined value, delete the complete timeStamp subdirectory
+          if (givenTime + timeStamp < currentTime) {
+            deleteDataFromTrashFolderByFile(carbonFile);
+          }
+        }
+      } catch (IOException e) {
+        LOGGER.error("Error during deleting from trash folder", e);
+      }
+    }
+  }
+
+  /**
+   * The below method deletes all the files and folders in the trash folder of a carbon table.
+   */
+  public static void deleteAllDataFromTrashFolder(String carbonTablePath)
+          throws IOException {
+    String pathOfTrashFolder = CarbonTablePath.getTrashFolder(carbonTablePath);
+    // if the trash folder exists delete the contents of the trash folder, if it does not exists
+    // create a trash folder
+    if (FileFactory.isFileExist(pathOfTrashFolder)) {
+      try {
+        List<CarbonFile> carbonFileList = FileFactory.getFolderList(pathOfTrashFolder);
+        for (CarbonFile carbonFile : carbonFileList) {
+          deleteDataFromTrashFolderByFile(carbonFile);
+        }
+      } catch (IOException e) {
+        LOGGER.error("Error during deleting from trash folder", e);
+      }
+    }
+  }
+
+  /**
+   * The below method deletes a specific file in the trash folder.
+   */
+  private static void deleteDataFromTrashFolderByFile(CarbonFile carbonFile) {
+    try {
+      FileFactory.deleteAllCarbonFilesOfDir(carbonFile);

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#discussion_r512553653



##########
File path: docs/dml-of-carbondata.md
##########
@@ -562,3 +563,50 @@ CarbonData DML statements are documented here,which includes:
   ```
   CLEAN FILES FOR TABLE carbon_table
   ```
+
+## CLEAN FILES
+
+  Clean files command is used to remove the Compacted and Marked
+  For Delete Segments from the store. Carbondata also supports Trash
+  Folder where all the stale data is moved to after clean files
+  is called
+
+  There are several types of compaction
+
+  ```
+  CLEAN FILES ON TABLE TableName
+  ```
+
+  - **Minor Compaction**

Review comment:
       removed

##########
File path: docs/dml-of-carbondata.md
##########
@@ -562,3 +563,50 @@ CarbonData DML statements are documented here,which includes:
   ```
   CLEAN FILES FOR TABLE carbon_table
   ```
+
+## CLEAN FILES
+
+  Clean files command is used to remove the Compacted and Marked

Review comment:
       linked

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/cleanfiles/CleanFilesUtil.scala
##########
@@ -0,0 +1,409 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.cleanfiles
+
+import java.util
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable.ListBuffer
+
+import org.apache.spark.sql.{AnalysisException, CarbonEnv, Row, SparkSession}
+import org.apache.spark.sql.index.CarbonIndexUtil
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.exception.ConcurrentOperationException
+import org.apache.carbondata.core.indexstore.PartitionSpec
+import org.apache.carbondata.core.locks.{CarbonLockFactory, CarbonLockUtil, ICarbonLock, LockUsage}
+import org.apache.carbondata.core.metadata.{AbsoluteTableIdentifier, CarbonMetadata, SegmentFileStore}
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable
+import org.apache.carbondata.core.mutate.CarbonUpdateUtil
+import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, SegmentStatus, SegmentStatusManager}
+import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil}
+import org.apache.carbondata.core.util.path.{CarbonTablePath, TrashUtil}
+import org.apache.carbondata.processing.loading.TableProcessingOperations
+import org.apache.carbondata.processing.loading.model.CarbonLoadModel
+
+object CleanFilesUtil {
+  private val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName)
+
+  /**
+   * The method deletes all data if forceTableClean <true> and clean garbage segment
+   * (MARKED_FOR_DELETE state) if forceTableClean <false>
+   *
+   * @param dbName                 : Database name
+   * @param tableName              : Table name
+   * @param tablePath              : Table path
+   * @param carbonTable            : CarbonTable Object <null> in case of force clean
+   * @param forceTableClean        : <true> for force clean it will delete all data
+   *                               <false> it will clean garbage segment (MARKED_FOR_DELETE state)
+   * @param currentTablePartitions : Hive Partitions  details
+   */
+  def cleanFiles(
+    dbName: String,
+    tableName: String,
+    tablePath: String,
+    timeStamp: String,
+    carbonTable: CarbonTable,
+    forceTableClean: Boolean,
+    currentTablePartitions: Option[Seq[PartitionSpec]] = None,
+    truncateTable: Boolean = false): Unit = {
+    var carbonCleanFilesLock: ICarbonLock = null
+    val absoluteTableIdentifier = if (forceTableClean) {
+      AbsoluteTableIdentifier.from(tablePath, dbName, tableName, tableName)
+    } else {
+      carbonTable.getAbsoluteTableIdentifier
+    }
+    try {
+      val errorMsg = "Clean files request is failed for " +
+        s"$dbName.$tableName" +
+        ". Not able to acquire the clean files lock due to another clean files " +
+        "operation is running in the background."
+      // in case of force clean the lock is not required
+      if (forceTableClean) {
+        FileFactory.deleteAllCarbonFilesOfDir(
+          FileFactory.getCarbonFile(absoluteTableIdentifier.getTablePath))
+      } else {
+        carbonCleanFilesLock =
+          CarbonLockUtil
+            .getLockObject(absoluteTableIdentifier, LockUsage.CLEAN_FILES_LOCK, errorMsg)
+        if (truncateTable) {
+          SegmentStatusManager.truncateTable(carbonTable)
+        }
+        SegmentStatusManager.deleteLoadsAndUpdateMetadata(
+          carbonTable, true, currentTablePartitions.map(_.asJava).orNull, timeStamp)
+        CarbonUpdateUtil.cleanUpDeltaFiles(carbonTable, true)
+        currentTablePartitions match {
+          case Some(partitions) =>
+            SegmentFileStore.cleanSegments(
+              carbonTable,
+              currentTablePartitions.map(_.asJava).orNull,
+              timeStamp,
+              true)
+          case _ =>
+        }
+      }
+    } finally {
+      if (currentTablePartitions.equals(None)) {
+        cleanUpPartitionFoldersRecursively(carbonTable, List.empty[PartitionSpec])
+      } else {
+        cleanUpPartitionFoldersRecursively(carbonTable, currentTablePartitions.get.toList)
+      }
+
+      if (carbonCleanFilesLock != null) {
+        CarbonLockUtil.fileUnlock(carbonCleanFilesLock, LockUsage.CLEAN_FILES_LOCK)
+      }
+    }
+  }
+
+
+  /**
+   * delete partition folders recursively
+   *
+   * @param carbonTable
+   * @param partitionSpecList
+   */
+  def cleanUpPartitionFoldersRecursively(carbonTable: CarbonTable,
+      partitionSpecList: List[PartitionSpec]): Unit = {
+    if (carbonTable != null && carbonTable.isHivePartitionTable) {
+      val loadMetadataDetails = SegmentStatusManager
+        .readLoadMetadata(carbonTable.getMetadataPath)
+
+      val carbonFile = FileFactory.getCarbonFile(carbonTable.getTablePath)
+
+      // list all files from table path
+      val listOfDefaultPartFilesIterator = carbonFile.listFiles(true)
+      loadMetadataDetails.foreach { metadataDetail =>
+        if (metadataDetail.getSegmentStatus.equals(SegmentStatus.MARKED_FOR_DELETE) &&
+          metadataDetail.getSegmentFile == null) {
+          val loadStartTime: Long = metadataDetail.getLoadStartTime
+          // delete all files of @loadStartTime from table path
+          cleanCarbonFilesInFolder(listOfDefaultPartFilesIterator, loadStartTime)
+          partitionSpecList.foreach {
+            partitionSpec =>
+              val partitionLocation = partitionSpec.getLocation
+              // For partition folder outside the tablePath
+              if (!partitionLocation.toString.startsWith(carbonTable.getTablePath)) {
+                val partitionCarbonFile = FileFactory
+                  .getCarbonFile(partitionLocation.toString)
+                // list all files from partitionLocation
+                val listOfExternalPartFilesIterator = partitionCarbonFile.listFiles(true)
+                // delete all files of @loadStartTime from externalPath
+                cleanCarbonFilesInFolder(listOfExternalPartFilesIterator, loadStartTime)
+              }
+          }
+        }
+      }
+    }
+  }
+
+  /**
+   *
+   * @param carbonFiles
+   * @param timestamp
+   */
+  private def cleanCarbonFilesInFolder(carbonFiles: java.util.List[CarbonFile],
+      timestamp: Long): Unit = {
+    carbonFiles.asScala.foreach { carbonFile =>
+        val filePath = carbonFile.getPath
+        val fileName = carbonFile.getName
+        if (CarbonTablePath.DataFileUtil.compareCarbonFileTimeStamp(fileName, timestamp)) {
+          FileFactory.deleteFile(filePath)
+        }
+    }
+  }
+
+  /**
+   * The in-progress segments which are in stale state will be marked as deleted
+   * when driver is initializing.
+   *
+   * @param databaseLocation
+   * @param dbName
+   */
+  def cleanInProgressSegments(databaseLocation: String, dbName: String, timeStamp: String): Unit = {
+    val loaderDriver = CarbonProperties.getInstance().
+      getProperty(CarbonCommonConstants.DATA_MANAGEMENT_DRIVER,
+        CarbonCommonConstants.DATA_MANAGEMENT_DRIVER_DEFAULT).toBoolean
+    if (!loaderDriver) {
+      return
+    }
+    try {
+      if (FileFactory.isFileExist(databaseLocation)) {
+        val file = FileFactory.getCarbonFile(databaseLocation)
+        if (file.isDirectory) {
+          val tableFolders = file.listFiles()
+          tableFolders.foreach { tableFolder =>
+            if (tableFolder.isDirectory) {
+              val tablePath = databaseLocation + CarbonCommonConstants.FILE_SEPARATOR +
+               tableFolder.getName
+              val tableUniqueName = CarbonTable.buildUniqueName(dbName, tableFolder.getName)
+              val tableStatusFile =
+                CarbonTablePath.getTableStatusFilePath(tablePath)
+              if (FileFactory.isFileExist(tableStatusFile)) {
+                try {
+                  val carbonTable = CarbonMetadata.getInstance
+                    .getCarbonTable(tableUniqueName)
+                  SegmentStatusManager.deleteLoadsAndUpdateMetadata(carbonTable, true, null,
+                    timeStamp)
+                } catch {
+                  case _: Exception =>
+                    LOGGER.warn(s"Error while cleaning table " + s"$tableUniqueName")
+                }
+              }
+            }
+          }
+        }
+      }
+    } catch {
+      case s: java.io.FileNotFoundException =>
+        LOGGER.error(s)
+    }
+  }
+
+  /**
+   * The below method deletes all the files and folders in the trash folders of all carbon tables
+   * in all databases
+   */
+  def deleteDataFromTrashFolderInAllTables(sparkSession: SparkSession): Unit = {
+    try {
+      val databases = sparkSession.sessionState.catalog.listDatabases()
+      databases.foreach(dbName => {
+        val databaseLocation = CarbonEnv.getDatabaseLocation(dbName, sparkSession)
+        if (FileFactory.isFileExist(databaseLocation)) {
+          val file = FileFactory.getCarbonFile(databaseLocation)
+          if (file.isDirectory) {
+            val tableFolders = file.listFiles()
+            tableFolders.foreach { tableFolder =>
+              if (tableFolder.isDirectory) {
+                val tablePath = databaseLocation +
+                  CarbonCommonConstants.FILE_SEPARATOR + tableFolder.getName
+                TrashUtil.deleteAllDataFromTrashFolder(tablePath)
+              }
+            }
+          }
+        }
+      })
+    } catch {
+      case e: Throwable =>
+        // catch all exceptions to avoid failure
+        LogServiceFactory.getLogService(this.getClass.getCanonicalName)

Review comment:
       done

##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -1427,6 +1427,25 @@ private CarbonCommonConstants() {
 
   public static final String BITSET_PIPE_LINE_DEFAULT = "true";
 
+  public static final String MICROSECONDS_IN_A_DAY = "86400000";
+
+  /**
+   * this is the user defined time(in days), when a specific timestamp subdirectory in
+   * trash folder will expire
+   */
+  @CarbonProperty
+  public static final String TRASH_EXPIRATION_TIME = "carbon.trash.expiration.time";

Review comment:
       done

##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -1427,6 +1427,25 @@ private CarbonCommonConstants() {
 
   public static final String BITSET_PIPE_LINE_DEFAULT = "true";
 
+  public static final String MICROSECONDS_IN_A_DAY = "86400000";
+
+  /**
+   * this is the user defined time(in days), when a specific timestamp subdirectory in
+   * trash folder will expire
+   */
+  @CarbonProperty
+  public static final String TRASH_EXPIRATION_TIME = "carbon.trash.expiration.time";
+
+  /**
+   * Default expiration time of trash folder is 3 days.
+   */
+  public static final String TRASH_EXPIRATION_TIME_DEFAULT = "3";

Review comment:
       done

##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -1033,7 +1034,7 @@ public static void commitDropPartitions(CarbonTable carbonTable, String uniqueId
    * @throws IOException
    */
   public static void cleanSegments(CarbonTable table, List<PartitionSpec> partitionSpecs,
-      boolean forceDelete) throws IOException {
+      String timeStamp, boolean forceDelete) throws IOException {

Review comment:
       done

##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -1105,28 +1109,79 @@ public static void cleanSegments(CarbonTable table, List<PartitionSpec> partitio
    * @throws IOException
    */
   public static void deleteSegment(String tablePath, Segment segment,
-      List<PartitionSpec> partitionSpecs,
-      SegmentUpdateStatusManager updateStatusManager) throws Exception {
+      List<PartitionSpec> partitionSpecs, SegmentUpdateStatusManager updateStatusManager,
+      SegmentStatus segmentStatus, Boolean isPartitionTable, String timeStamp)

Review comment:
       done

##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -1427,6 +1428,25 @@ private CarbonCommonConstants() {
 
   public static final String BITSET_PIPE_LINE_DEFAULT = "true";
 
+  public static final long MILLIS_SECONDS_IN_A_DAY = TimeUnit.DAYS.toMillis(1);
+
+  /**
+   * this is the user defined time(in days), when a specific timestamp subdirectory in
+   * trash folder will expire
+   */
+  @CarbonProperty
+  public static final String TRASH_EXPIRATION_DAYS = "carbon.trash.expiration.days";

Review comment:
       done

##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -1427,6 +1428,25 @@ private CarbonCommonConstants() {
 
   public static final String BITSET_PIPE_LINE_DEFAULT = "true";
 
+  public static final long MILLIS_SECONDS_IN_A_DAY = TimeUnit.DAYS.toMillis(1);
+
+  /**
+   * this is the user defined time(in days), when a specific timestamp subdirectory in
+   * trash folder will expire
+   */
+  @CarbonProperty
+  public static final String TRASH_EXPIRATION_DAYS = "carbon.trash.expiration.days";
+
+  /**
+   * Default expiration time of trash folder is 3 days.
+   */
+  public static final String TRASH_EXPIRATION_DAYS_DEFAULT = "3";

Review comment:
       done

##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##########
@@ -1049,7 +1049,7 @@ private static ReturnTuple isUpdateRequired(boolean isForceDeletion, CarbonTable
   }
 
   public static void deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean isForceDeletion,
-      List<PartitionSpec> partitionSpecs) throws IOException {
+      List<PartitionSpec> partitionSpecs, String timeStamp) throws IOException {

Review comment:
       done

##########
File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##########
@@ -2116,6 +2086,20 @@ public int getMaxSIRepairLimit(String dbName, String tableName) {
     return Math.abs(Integer.parseInt(thresholdValue));
   }
 
+  /**
+   * The below method returns the microseconds after which the trash folder will expire
+   */
+  public long getTrashFolderExpirationTime() {
+    String configuredValue = getProperty(CarbonCommonConstants.TRASH_EXPIRATION_DAYS,
+            CarbonCommonConstants.TRASH_EXPIRATION_DAYS_DEFAULT);
+    int result = Integer.parseInt(configuredValue);

Review comment:
       added try catch

##########
File path: core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java
##########
@@ -67,22 +69,23 @@ private static String getSegmentPath(AbsoluteTableIdentifier identifier,
   }
 
   public static void physicalFactAndMeasureMetadataDeletion(CarbonTable carbonTable,
-      LoadMetadataDetails[] newAddedLoadHistoryList,
-      boolean isForceDelete,
-      List<PartitionSpec> specs) {
+      LoadMetadataDetails[] newAddedLoadHistoryList, boolean isForceDelete,
+      List<PartitionSpec> specs, String timeStamp) {

Review comment:
       done

##########
File path: core/src/main/java/org/apache/carbondata/core/util/path/CarbonTablePath.java
##########
@@ -792,4 +793,9 @@ public static String getParentPath(String dataFilePath) {
       return dataFilePath;
     }
   }
+
+  public static String getTrashFolder(String carbonTablePath) {

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#issuecomment-717131900


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#issuecomment-717189509


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4705/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


12345678 ... 10