[GitHub] [carbondata] vikramahuja1001 opened a new pull request #4051: [WIP] Only consider .segment files for stale segments

classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4051: [CARBONDATA-4081] Fix multiple issues with clean files command

GitBox

vikramahuja1001 commented on a change in pull request #4051:
URL: https://github.com/apache/carbondata/pull/4051#discussion_r544170327



##########
File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java
##########
@@ -157,11 +157,12 @@ public static void deleteExpiredDataFromTrash(String tablePath) {
     // Deleting the timestamp based subdirectories in the trashfolder by the given timestamp.
     try {
       if (FileFactory.isFileExist(trashPath)) {

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4051: [CARBONDATA-4081] Fix multiple issues with clean files command

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #4051:
URL: https://github.com/apache/carbondata/pull/4051#discussion_r544171003



##########
File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java
##########
@@ -157,11 +157,12 @@ public static void deleteExpiredDataFromTrash(String tablePath) {
     // Deleting the timestamp based subdirectories in the trashfolder by the given timestamp.
     try {
       if (FileFactory.isFileExist(trashPath)) {

Review comment:
       @ajantha-bhat , i can raise a PR for that as well




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4051: [CARBONDATA-4081] Fix multiple issues with clean files command

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4051:
URL: https://github.com/apache/carbondata/pull/4051#issuecomment-746162203


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5182/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4051: [CARBONDATA-4081] Fix multiple issues with clean files command

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4051:
URL: https://github.com/apache/carbondata/pull/4051#issuecomment-746162830


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3420/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4051: [CARBONDATA-4081] Fix multiple issues with clean files command

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4051:
URL: https://github.com/apache/carbondata/pull/4051#discussion_r544833038



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala
##########
@@ -38,26 +40,33 @@ case class CarbonCleanFilesCommand(
     isInternalCleanCall: Boolean = false)
   extends DataCommand {
 
+  val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getCanonicalName)
+
   override def processData(sparkSession: SparkSession): Seq[Row] = {
     Checker.validateTableExists(databaseNameOp, tableName, sparkSession)
     val carbonTable = CarbonEnv.getCarbonTable(databaseNameOp, tableName)(sparkSession)
     setAuditTable(carbonTable)
-    // if insert overwrite in progress, do not allow delete segment
-    if (SegmentStatusManager.isOverwriteInProgressInTable(carbonTable)) {
+    // if insert overwrite in progress and table not a MV, do not allow delete segment
+    if (!carbonTable.isMV && SegmentStatusManager.isOverwriteInProgressInTable(carbonTable)) {

Review comment:
       Not here, handle at the place where we call clean files for MV when cleanfiles is called for maintable. Else if the user calls clean files on MV table when concurrently insert overwrite is happening, now you don't throw an exception. which is out of synch with main table clean files behavior.  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4051: [CARBONDATA-4081] Fix multiple issues with clean files command

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4051:
URL: https://github.com/apache/carbondata/pull/4051#discussion_r544833244



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala
##########
@@ -38,26 +40,33 @@ case class CarbonCleanFilesCommand(
     isInternalCleanCall: Boolean = false)
   extends DataCommand {
 
+  val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getCanonicalName)
+
   override def processData(sparkSession: SparkSession): Seq[Row] = {
     Checker.validateTableExists(databaseNameOp, tableName, sparkSession)
     val carbonTable = CarbonEnv.getCarbonTable(databaseNameOp, tableName)(sparkSession)
     setAuditTable(carbonTable)
-    // if insert overwrite in progress, do not allow delete segment
-    if (SegmentStatusManager.isOverwriteInProgressInTable(carbonTable)) {
+    // if insert overwrite in progress and table not a MV, do not allow delete segment

Review comment:
       ```suggestion
       // if insert overwrite in progress and table is not a MV, do not allow delete segment
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4051: [CARBONDATA-4081] Fix multiple issues with clean files command

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4051:
URL: https://github.com/apache/carbondata/pull/4051#discussion_r544839581



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala
##########
@@ -38,26 +40,33 @@ case class CarbonCleanFilesCommand(
     isInternalCleanCall: Boolean = false)
   extends DataCommand {
 
+  val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getCanonicalName)
+
   override def processData(sparkSession: SparkSession): Seq[Row] = {
     Checker.validateTableExists(databaseNameOp, tableName, sparkSession)
     val carbonTable = CarbonEnv.getCarbonTable(databaseNameOp, tableName)(sparkSession)
     setAuditTable(carbonTable)
-    // if insert overwrite in progress, do not allow delete segment
-    if (SegmentStatusManager.isOverwriteInProgressInTable(carbonTable)) {
+    // if insert overwrite in progress and table not a MV, do not allow delete segment
+    if (!carbonTable.isMV && SegmentStatusManager.isOverwriteInProgressInTable(carbonTable)) {
       throw new ConcurrentOperationException(carbonTable, "insert overwrite", "clean file")
     }
     if (!carbonTable.getTableInfo.isTransactionalTable) {
       throw new MalformedCarbonCommandException("Unsupported operation on non transactional table")
     }
 
-    val preEvent = CleanFilesPreEvent(carbonTable, sparkSession)
-    val postEvent = CleanFilesPostEvent(carbonTable, sparkSession, options)
-    withEvents(preEvent, postEvent) {
-      DataTrashManager.cleanGarbageData(
-        carbonTable,
-        options.getOrElse("force", "false").toBoolean,
-        options.getOrElse("stale_inprogress", "false").toBoolean,
-        CarbonFilters.getPartitions(Seq.empty[Expression], sparkSession, carbonTable))
+    // only proceed if not a MV and if insert overwrite not in progress
+    if (!carbonTable.isMV && !SegmentStatusManager.isOverwriteInProgressInTable(carbonTable)) {
+      val preEvent = CleanFilesPreEvent(carbonTable, sparkSession)
+      val postEvent = CleanFilesPostEvent(carbonTable, sparkSession, options)
+      withEvents(preEvent, postEvent) {
+        DataTrashManager.cleanGarbageData(
+          carbonTable,
+          options.getOrElse("force", "false").toBoolean,
+          options.getOrElse("stale_inprogress", "false").toBoolean,
+          CarbonFilters.getPartitions(Seq.empty[Expression], sparkSession, carbonTable))
+      }
+    } else {
+      LOGGER.info(s"Can not do clean files operation for the MV: ${carbonTable.getTableName}")

Review comment:
       please handle based on isInternalCleanCall

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala
##########
@@ -38,26 +40,33 @@ case class CarbonCleanFilesCommand(
     isInternalCleanCall: Boolean = false)
   extends DataCommand {
 
+  val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getCanonicalName)
+
   override def processData(sparkSession: SparkSession): Seq[Row] = {
     Checker.validateTableExists(databaseNameOp, tableName, sparkSession)
     val carbonTable = CarbonEnv.getCarbonTable(databaseNameOp, tableName)(sparkSession)
     setAuditTable(carbonTable)
-    // if insert overwrite in progress, do not allow delete segment
-    if (SegmentStatusManager.isOverwriteInProgressInTable(carbonTable)) {
+    // if insert overwrite in progress and table not a MV, do not allow delete segment
+    if (!carbonTable.isMV && SegmentStatusManager.isOverwriteInProgressInTable(carbonTable)) {

Review comment:
       please handle based on isInternalCleanCall




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4051: [CARBONDATA-4081] Fix multiple issues with clean files command

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4051:
URL: https://github.com/apache/carbondata/pull/4051#issuecomment-747342414


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3434/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4051: [CARBONDATA-4081] Fix multiple issues with clean files command

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4051:
URL: https://github.com/apache/carbondata/pull/4051#issuecomment-747342813


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5194/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #4051: [CARBONDATA-4081] Fix multiple issues with clean files command

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #4051:
URL: https://github.com/apache/carbondata/pull/4051#issuecomment-747375802


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #4051: [CARBONDATA-4081] Fix multiple issues with clean files command

GitBox
In reply to this post by GitBox

asfgit closed pull request #4051:
URL: https://github.com/apache/carbondata/pull/4051


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


12