[GitHub] [carbondata] akashrn5 opened a new pull request #3854: [WIP]Fix compaction failure issue for SI table and metadata mismatch in concurrency

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 opened a new pull request #3854: [WIP]Fix compaction failure issue for SI table and metadata mismatch in concurrency

GitBox

akashrn5 opened a new pull request #3854:
URL: https://github.com/apache/carbondata/pull/3854


   
   
    ### Why is this PR needed?
   
   
    ### What changes were proposed in this PR?
   
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3854: [WIP]Fix compaction failure issue for SI table and metadata mismatch in concurrency

GitBox

CarbonDataQA1 commented on pull request #3854:
URL: https://github.com/apache/carbondata/pull/3854#issuecomment-660975867


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3439/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3854: [WIP]Fix compaction failure issue for SI table and metadata mismatch in concurrency

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3854:
URL: https://github.com/apache/carbondata/pull/3854#issuecomment-660976081


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1697/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3854: [CARBONDATA-3920]Fix compaction failure issue for SI table and metadata mismatch in concurrency

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3854:
URL: https://github.com/apache/carbondata/pull/3854#issuecomment-662510411


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1729/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3854: [CARBONDATA-3920]Fix compaction failure issue for SI table and metadata mismatch in concurrency

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3854:
URL: https://github.com/apache/carbondata/pull/3854#issuecomment-662520344


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3471/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3854: [CARBONDATA-3920]Fix compaction failure issue for SI table and metadata mismatch in concurrency

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3854:
URL: https://github.com/apache/carbondata/pull/3854#discussion_r459476102



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/CleanFilesPostEventListener.scala
##########
@@ -54,7 +60,70 @@ class CleanFilesPostEventListener extends OperationEventListener with Logging {
           SegmentStatusManager.deleteLoadsAndUpdateMetadata(
             indexTable, true, partitions.map(_.asJava).orNull)
           CarbonUpdateUtil.cleanUpDeltaFiles(indexTable, true)
+          cleanUpUnwantedSegmentsOfSIAndUpdateMetadata(indexTable, carbonTable)
         }
     }
   }
+
+  /**
+   * This method added to clean the segments which are success in SI and may be compacted or marked
+   * for delete in main table, which can happen in case of concurrent scenarios.
+   */
+  def cleanUpUnwantedSegmentsOfSIAndUpdateMetadata(indexTable: CarbonTable,
+      mainTable: CarbonTable): Unit = {
+    val mainTableStatusLock: ICarbonLock = CarbonLockFactory
+      .getCarbonLockObj(mainTable.getAbsoluteTableIdentifier, LockUsage.TABLE_STATUS_LOCK)
+    val indexTableStatusLock: ICarbonLock = CarbonLockFactory
+      .getCarbonLockObj(indexTable.getAbsoluteTableIdentifier, LockUsage.TABLE_STATUS_LOCK)
+    var mainTableLocked = false
+    var indexTableLocked = false
+    try {
+      mainTableLocked = mainTableStatusLock.lockWithRetries()

Review comment:
       If unable to get lock during the concurrent scenario, better to throw an exception to retry clean files command?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3854: [CARBONDATA-3920]Fix compaction failure issue for SI table and metadata mismatch in concurrency

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3854:
URL: https://github.com/apache/carbondata/pull/3854#discussion_r459482952



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/CleanFilesPostEventListener.scala
##########
@@ -54,7 +60,70 @@ class CleanFilesPostEventListener extends OperationEventListener with Logging {
           SegmentStatusManager.deleteLoadsAndUpdateMetadata(
             indexTable, true, partitions.map(_.asJava).orNull)
           CarbonUpdateUtil.cleanUpDeltaFiles(indexTable, true)
+          cleanUpUnwantedSegmentsOfSIAndUpdateMetadata(indexTable, carbonTable)
         }
     }
   }
+
+  /**
+   * This method added to clean the segments which are success in SI and may be compacted or marked
+   * for delete in main table, which can happen in case of concurrent scenarios.
+   */
+  def cleanUpUnwantedSegmentsOfSIAndUpdateMetadata(indexTable: CarbonTable,
+      mainTable: CarbonTable): Unit = {
+    val mainTableStatusLock: ICarbonLock = CarbonLockFactory
+      .getCarbonLockObj(mainTable.getAbsoluteTableIdentifier, LockUsage.TABLE_STATUS_LOCK)
+    val indexTableStatusLock: ICarbonLock = CarbonLockFactory
+      .getCarbonLockObj(indexTable.getAbsoluteTableIdentifier, LockUsage.TABLE_STATUS_LOCK)
+    var mainTableLocked = false
+    var indexTableLocked = false
+    try {
+      mainTableLocked = mainTableStatusLock.lockWithRetries()

Review comment:
       since its a clean files, no need to throw exception, it can retry next time




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3854: [CARBONDATA-3920]Fix compaction failure issue for SI table and metadata mismatch in concurrency

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3854:
URL: https://github.com/apache/carbondata/pull/3854#discussion_r459483393



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/CleanFilesPostEventListener.scala
##########
@@ -54,7 +60,70 @@ class CleanFilesPostEventListener extends OperationEventListener with Logging {
           SegmentStatusManager.deleteLoadsAndUpdateMetadata(
             indexTable, true, partitions.map(_.asJava).orNull)
           CarbonUpdateUtil.cleanUpDeltaFiles(indexTable, true)
+          cleanUpUnwantedSegmentsOfSIAndUpdateMetadata(indexTable, carbonTable)
         }
     }
   }
+
+  /**
+   * This method added to clean the segments which are success in SI and may be compacted or marked
+   * for delete in main table, which can happen in case of concurrent scenarios.
+   */
+  def cleanUpUnwantedSegmentsOfSIAndUpdateMetadata(indexTable: CarbonTable,
+      mainTable: CarbonTable): Unit = {
+    val mainTableStatusLock: ICarbonLock = CarbonLockFactory
+      .getCarbonLockObj(mainTable.getAbsoluteTableIdentifier, LockUsage.TABLE_STATUS_LOCK)
+    val indexTableStatusLock: ICarbonLock = CarbonLockFactory
+      .getCarbonLockObj(indexTable.getAbsoluteTableIdentifier, LockUsage.TABLE_STATUS_LOCK)
+    var mainTableLocked = false
+    var indexTableLocked = false
+    try {
+      mainTableLocked = mainTableStatusLock.lockWithRetries()
+      indexTableLocked = indexTableStatusLock.lockWithRetries()
+      if (mainTableLocked && indexTableLocked) {
+        val mainTableMetadataDetails =
+          SegmentStatusManager.readLoadMetadata(mainTable.getMetadataPath).toSet ++
+          SegmentStatusManager.readLoadHistoryMetadata(mainTable.getMetadataPath).toSet
+        val indexTableMetadataDetails =
+          SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath).toSet
+        val segToStatusMap = mainTableMetadataDetails
+          .map(detail => detail.getLoadName -> detail.getSegmentStatus).toMap
+
+        val unnecessarySegmentsOfSI = indexTableMetadataDetails.filter { indexDetail =>
+          indexDetail.getSegmentStatus.equals(SegmentStatus.SUCCESS) &&
+          segToStatusMap.contains(indexDetail.getLoadName) &&
+          (segToStatusMap(indexDetail.getLoadName).equals(SegmentStatus.COMPACTED) ||
+           segToStatusMap(indexDetail.getLoadName).equals(SegmentStatus.MARKED_FOR_DELETE))
+        }
+        LOGGER.info(s"Unwanted SI segments are: $unnecessarySegmentsOfSI")
+        unnecessarySegmentsOfSI.foreach { detail =>
+          val carbonFile = FileFactory
+            .getCarbonFile(CarbonTablePath
+              .getSegmentPath(indexTable.getTablePath, detail.getLoadName))
+          CarbonUtil.deleteFoldersAndFiles(carbonFile)
+        }
+        unnecessarySegmentsOfSI.foreach { detail =>
+          detail.setSegmentStatus(segToStatusMap(detail.getLoadName))
+          detail.setVisibility("false")
+        }
+        indexTableStatusLock.unlock()

Review comment:
       release the lock after updating the SI table status. now it is released before. It can impact concurrent scenarios




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3854: [CARBONDATA-3920]Fix compaction failure issue for SI table and metadata mismatch in concurrency

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3854:
URL: https://github.com/apache/carbondata/pull/3854#discussion_r459490020



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/CleanFilesPostEventListener.scala
##########
@@ -54,7 +60,70 @@ class CleanFilesPostEventListener extends OperationEventListener with Logging {
           SegmentStatusManager.deleteLoadsAndUpdateMetadata(
             indexTable, true, partitions.map(_.asJava).orNull)
           CarbonUpdateUtil.cleanUpDeltaFiles(indexTable, true)
+          cleanUpUnwantedSegmentsOfSIAndUpdateMetadata(indexTable, carbonTable)
         }
     }
   }
+
+  /**
+   * This method added to clean the segments which are success in SI and may be compacted or marked
+   * for delete in main table, which can happen in case of concurrent scenarios.
+   */
+  def cleanUpUnwantedSegmentsOfSIAndUpdateMetadata(indexTable: CarbonTable,
+      mainTable: CarbonTable): Unit = {
+    val mainTableStatusLock: ICarbonLock = CarbonLockFactory
+      .getCarbonLockObj(mainTable.getAbsoluteTableIdentifier, LockUsage.TABLE_STATUS_LOCK)
+    val indexTableStatusLock: ICarbonLock = CarbonLockFactory
+      .getCarbonLockObj(indexTable.getAbsoluteTableIdentifier, LockUsage.TABLE_STATUS_LOCK)
+    var mainTableLocked = false
+    var indexTableLocked = false
+    try {
+      mainTableLocked = mainTableStatusLock.lockWithRetries()
+      indexTableLocked = indexTableStatusLock.lockWithRetries()
+      if (mainTableLocked && indexTableLocked) {
+        val mainTableMetadataDetails =
+          SegmentStatusManager.readLoadMetadata(mainTable.getMetadataPath).toSet ++
+          SegmentStatusManager.readLoadHistoryMetadata(mainTable.getMetadataPath).toSet
+        val indexTableMetadataDetails =
+          SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath).toSet
+        val segToStatusMap = mainTableMetadataDetails
+          .map(detail => detail.getLoadName -> detail.getSegmentStatus).toMap
+
+        val unnecessarySegmentsOfSI = indexTableMetadataDetails.filter { indexDetail =>
+          indexDetail.getSegmentStatus.equals(SegmentStatus.SUCCESS) &&
+          segToStatusMap.contains(indexDetail.getLoadName) &&
+          (segToStatusMap(indexDetail.getLoadName).equals(SegmentStatus.COMPACTED) ||
+           segToStatusMap(indexDetail.getLoadName).equals(SegmentStatus.MARKED_FOR_DELETE))
+        }
+        LOGGER.info(s"Unwanted SI segments are: $unnecessarySegmentsOfSI")
+        unnecessarySegmentsOfSI.foreach { detail =>
+          val carbonFile = FileFactory
+            .getCarbonFile(CarbonTablePath
+              .getSegmentPath(indexTable.getTablePath, detail.getLoadName))
+          CarbonUtil.deleteFoldersAndFiles(carbonFile)
+        }
+        unnecessarySegmentsOfSI.foreach { detail =>
+          detail.setSegmentStatus(segToStatusMap(detail.getLoadName))
+          detail.setVisibility("false")
+        }
+        indexTableStatusLock.unlock()

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3854: [CARBONDATA-3920]Fix compaction failure issue for SI table and metadata mismatch in concurrency

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3854:
URL: https://github.com/apache/carbondata/pull/3854#discussion_r459532662



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/CleanFilesPostEventListener.scala
##########
@@ -54,7 +60,70 @@ class CleanFilesPostEventListener extends OperationEventListener with Logging {
           SegmentStatusManager.deleteLoadsAndUpdateMetadata(
             indexTable, true, partitions.map(_.asJava).orNull)
           CarbonUpdateUtil.cleanUpDeltaFiles(indexTable, true)
+          cleanUpUnwantedSegmentsOfSIAndUpdateMetadata(indexTable, carbonTable)
         }
     }
   }
+
+  /**
+   * This method added to clean the segments which are success in SI and may be compacted or marked
+   * for delete in main table, which can happen in case of concurrent scenarios.
+   */
+  def cleanUpUnwantedSegmentsOfSIAndUpdateMetadata(indexTable: CarbonTable,
+      mainTable: CarbonTable): Unit = {
+    val mainTableStatusLock: ICarbonLock = CarbonLockFactory
+      .getCarbonLockObj(mainTable.getAbsoluteTableIdentifier, LockUsage.TABLE_STATUS_LOCK)
+    val indexTableStatusLock: ICarbonLock = CarbonLockFactory
+      .getCarbonLockObj(indexTable.getAbsoluteTableIdentifier, LockUsage.TABLE_STATUS_LOCK)
+    var mainTableLocked = false
+    var indexTableLocked = false
+    try {
+      mainTableLocked = mainTableStatusLock.lockWithRetries()

Review comment:
       Atleast add an error log of unable to get lock, so that the user will know that something happened and need to retry.
   
   User tries multiple times in concurrent scenario and it won't clean due to lock issue. He will never know why it is not cleaned.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3854: [CARBONDATA-3920]Fix compaction failure issue for SI table and metadata mismatch in concurrency

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3854:
URL: https://github.com/apache/carbondata/pull/3854#discussion_r459534634



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/CleanFilesPostEventListener.scala
##########
@@ -54,7 +60,68 @@ class CleanFilesPostEventListener extends OperationEventListener with Logging {
           SegmentStatusManager.deleteLoadsAndUpdateMetadata(
             indexTable, true, partitions.map(_.asJava).orNull)
           CarbonUpdateUtil.cleanUpDeltaFiles(indexTable, true)
+          cleanUpUnwantedSegmentsOfSIAndUpdateMetadata(indexTable, carbonTable)
         }
     }
   }
+
+  /**
+   * This method added to clean the segments which are success in SI and may be compacted or marked
+   * for delete in main table, which can happen in case of concurrent scenarios.
+   */
+  def cleanUpUnwantedSegmentsOfSIAndUpdateMetadata(indexTable: CarbonTable,
+      mainTable: CarbonTable): Unit = {
+    val mainTableStatusLock: ICarbonLock = CarbonLockFactory
+      .getCarbonLockObj(mainTable.getAbsoluteTableIdentifier, LockUsage.TABLE_STATUS_LOCK)
+    val indexTableStatusLock: ICarbonLock = CarbonLockFactory
+      .getCarbonLockObj(indexTable.getAbsoluteTableIdentifier, LockUsage.TABLE_STATUS_LOCK)
+    var mainTableLocked = false
+    var indexTableLocked = false
+    try {
+      mainTableLocked = mainTableStatusLock.lockWithRetries()
+      indexTableLocked = indexTableStatusLock.lockWithRetries()
+      if (mainTableLocked && indexTableLocked) {
+        val mainTableMetadataDetails =
+          SegmentStatusManager.readLoadMetadata(mainTable.getMetadataPath).toSet ++
+          SegmentStatusManager.readLoadHistoryMetadata(mainTable.getMetadataPath).toSet
+        val indexTableMetadataDetails =
+          SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath).toSet
+        val segToStatusMap = mainTableMetadataDetails
+          .map(detail => detail.getLoadName -> detail.getSegmentStatus).toMap
+
+        val unnecessarySegmentsOfSI = indexTableMetadataDetails.filter { indexDetail =>
+          indexDetail.getSegmentStatus.equals(SegmentStatus.SUCCESS) &&
+          segToStatusMap.contains(indexDetail.getLoadName) &&
+          (segToStatusMap(indexDetail.getLoadName).equals(SegmentStatus.COMPACTED) ||
+           segToStatusMap(indexDetail.getLoadName).equals(SegmentStatus.MARKED_FOR_DELETE))
+        }
+        LOGGER.info(s"Unwanted SI segments are: $unnecessarySegmentsOfSI")
+        unnecessarySegmentsOfSI.foreach { detail =>
+          val carbonFile = FileFactory
+            .getCarbonFile(CarbonTablePath
+              .getSegmentPath(indexTable.getTablePath, detail.getLoadName))
+          CarbonUtil.deleteFoldersAndFiles(carbonFile)
+        }
+        unnecessarySegmentsOfSI.foreach { detail =>
+          detail.setSegmentStatus(segToStatusMap(detail.getLoadName))
+          detail.setVisibility("false")
+        }
+        CarbonInternalLoaderUtil.recordLoadMetadata(

Review comment:
       This will fail as it will try to acquire lock and we didn't release.
   
   here we need to call directly `writeLoadDetailsIntoFile`, as we already have  a lock.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3854: [CARBONDATA-3920]Fix compaction failure issue for SI table and metadata mismatch in concurrency

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3854:
URL: https://github.com/apache/carbondata/pull/3854#issuecomment-663116199


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3483/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3854: [CARBONDATA-3920]Fix compaction failure issue for SI table and metadata mismatch in concurrency

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3854:
URL: https://github.com/apache/carbondata/pull/3854#issuecomment-663118362


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1741/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3854: [CARBONDATA-3920]Fix compaction failure issue for SI table and metadata mismatch in concurrency

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3854:
URL: https://github.com/apache/carbondata/pull/3854#discussion_r459607797



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/CleanFilesPostEventListener.scala
##########
@@ -54,7 +60,70 @@ class CleanFilesPostEventListener extends OperationEventListener with Logging {
           SegmentStatusManager.deleteLoadsAndUpdateMetadata(
             indexTable, true, partitions.map(_.asJava).orNull)
           CarbonUpdateUtil.cleanUpDeltaFiles(indexTable, true)
+          cleanUpUnwantedSegmentsOfSIAndUpdateMetadata(indexTable, carbonTable)
         }
     }
   }
+
+  /**
+   * This method added to clean the segments which are success in SI and may be compacted or marked
+   * for delete in main table, which can happen in case of concurrent scenarios.
+   */
+  def cleanUpUnwantedSegmentsOfSIAndUpdateMetadata(indexTable: CarbonTable,
+      mainTable: CarbonTable): Unit = {
+    val mainTableStatusLock: ICarbonLock = CarbonLockFactory
+      .getCarbonLockObj(mainTable.getAbsoluteTableIdentifier, LockUsage.TABLE_STATUS_LOCK)
+    val indexTableStatusLock: ICarbonLock = CarbonLockFactory
+      .getCarbonLockObj(indexTable.getAbsoluteTableIdentifier, LockUsage.TABLE_STATUS_LOCK)
+    var mainTableLocked = false
+    var indexTableLocked = false
+    try {
+      mainTableLocked = mainTableStatusLock.lockWithRetries()

Review comment:
       done

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/CleanFilesPostEventListener.scala
##########
@@ -54,7 +60,68 @@ class CleanFilesPostEventListener extends OperationEventListener with Logging {
           SegmentStatusManager.deleteLoadsAndUpdateMetadata(
             indexTable, true, partitions.map(_.asJava).orNull)
           CarbonUpdateUtil.cleanUpDeltaFiles(indexTable, true)
+          cleanUpUnwantedSegmentsOfSIAndUpdateMetadata(indexTable, carbonTable)
         }
     }
   }
+
+  /**
+   * This method added to clean the segments which are success in SI and may be compacted or marked
+   * for delete in main table, which can happen in case of concurrent scenarios.
+   */
+  def cleanUpUnwantedSegmentsOfSIAndUpdateMetadata(indexTable: CarbonTable,
+      mainTable: CarbonTable): Unit = {
+    val mainTableStatusLock: ICarbonLock = CarbonLockFactory
+      .getCarbonLockObj(mainTable.getAbsoluteTableIdentifier, LockUsage.TABLE_STATUS_LOCK)
+    val indexTableStatusLock: ICarbonLock = CarbonLockFactory
+      .getCarbonLockObj(indexTable.getAbsoluteTableIdentifier, LockUsage.TABLE_STATUS_LOCK)
+    var mainTableLocked = false
+    var indexTableLocked = false
+    try {
+      mainTableLocked = mainTableStatusLock.lockWithRetries()
+      indexTableLocked = indexTableStatusLock.lockWithRetries()
+      if (mainTableLocked && indexTableLocked) {
+        val mainTableMetadataDetails =
+          SegmentStatusManager.readLoadMetadata(mainTable.getMetadataPath).toSet ++
+          SegmentStatusManager.readLoadHistoryMetadata(mainTable.getMetadataPath).toSet
+        val indexTableMetadataDetails =
+          SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath).toSet
+        val segToStatusMap = mainTableMetadataDetails
+          .map(detail => detail.getLoadName -> detail.getSegmentStatus).toMap
+
+        val unnecessarySegmentsOfSI = indexTableMetadataDetails.filter { indexDetail =>
+          indexDetail.getSegmentStatus.equals(SegmentStatus.SUCCESS) &&
+          segToStatusMap.contains(indexDetail.getLoadName) &&
+          (segToStatusMap(indexDetail.getLoadName).equals(SegmentStatus.COMPACTED) ||
+           segToStatusMap(indexDetail.getLoadName).equals(SegmentStatus.MARKED_FOR_DELETE))
+        }
+        LOGGER.info(s"Unwanted SI segments are: $unnecessarySegmentsOfSI")
+        unnecessarySegmentsOfSI.foreach { detail =>
+          val carbonFile = FileFactory
+            .getCarbonFile(CarbonTablePath
+              .getSegmentPath(indexTable.getTablePath, detail.getLoadName))
+          CarbonUtil.deleteFoldersAndFiles(carbonFile)
+        }
+        unnecessarySegmentsOfSI.foreach { detail =>
+          detail.setSegmentStatus(segToStatusMap(detail.getLoadName))
+          detail.setVisibility("false")
+        }
+        CarbonInternalLoaderUtil.recordLoadMetadata(

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3854: [CARBONDATA-3920]Fix compaction failure issue for SI table and metadata mismatch in concurrency

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3854:
URL: https://github.com/apache/carbondata/pull/3854#issuecomment-663196533


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1743/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3854: [CARBONDATA-3920]Fix compaction failure issue for SI table and metadata mismatch in concurrency

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3854:
URL: https://github.com/apache/carbondata/pull/3854#issuecomment-663197215


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3485/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on pull request #3854: [CARBONDATA-3920]Fix compaction failure issue for SI table and metadata mismatch in concurrency

GitBox
In reply to this post by GitBox

akashrn5 commented on pull request #3854:
URL: https://github.com/apache/carbondata/pull/3854#issuecomment-663346662


   @ajantha-bhat please review and merge


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #3854: [CARBONDATA-3920]Fix compaction failure issue for SI table and metadata mismatch in concurrency

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #3854:
URL: https://github.com/apache/carbondata/pull/3854#issuecomment-663515404


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #3854: [CARBONDATA-3920]Fix compaction failure issue for SI table and metadata mismatch in concurrency

GitBox
In reply to this post by GitBox

asfgit closed pull request #3854:
URL: https://github.com/apache/carbondata/pull/3854


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]