[GitHub] [carbondata] Indhumathi27 opened a new pull request #4067: [WIP] Fix concurrency(Load&Compaction) issue with SI

classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4067: [CARBONDATA-4100] Fix SI segments are in inconsistent state with maintable after concurrent(Load&Compaction) operation

GitBox

Indhumathi27 commented on a change in pull request #4067:
URL: https://github.com/apache/carbondata/pull/4067#discussion_r549728719



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##########
@@ -406,158 +407,195 @@ object CarbonIndexUtil {
       .asInstanceOf[CarbonRelation]
       .carbonTable
 
-    val siTblLoadMetadataDetails: Array[LoadMetadataDetails] =
-      SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath)
     var segmentLocks: ListBuffer[ICarbonLock] = ListBuffer.empty
-    if (!CarbonInternalLoaderUtil.checkMainTableSegEqualToSISeg(
-      mainTableDetails.toArray,
-      siTblLoadMetadataDetails)) {
-      val indexColumns = indexMetadata.getIndexColumns(secondaryIndexProvider,
-        indexTableName)
-      val indexModel = IndexModel(Some(carbonTable.getDatabaseName),
-        indexMetadata.getParentTableName,
-        indexColumns.split(",").toList,
-        indexTableName)
-
-      // var details = SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath)
-      // If it empty, then no need to do further computations because the
-      // tabletstatus might not have been created and hence next load will take care
-      if (siTblLoadMetadataDetails.isEmpty) {
-        Seq.empty
-      }
+    val compactionLock = CarbonLockFactory.getCarbonLockObj(
+      carbonTable.getAbsoluteTableIdentifier,
+      LockUsage.COMPACTION_LOCK)
+    try {
+      // In some cases, SI table segment might be in COMPACTED state and main table
+      // compaction might be still in progress. In those cases, we can try to take compaction lock
+      // on main table and then compare and add SI segments to failedLoads, to avoid repair
+      // SI SUCCESS loads.
+      if (compactionLock.lockWithRetries()) {
+        var mainTableDetails = try {
+          SegmentStatusManager.readTableStatusFile(CarbonTablePath.getTableStatusFilePath(
+            carbonTable.getTablePath))
+        } catch {
+          case _: Exception =>
+            if (!isLoadOrCompaction) {
+              throw new ConcurrentOperationException(carbonTable.getDatabaseName,
+                carbonTable.getTableName, "table status read", "reindex command")
+            }
+            return;

Review comment:
       already a LOG is present  in readTableStatus file. still need to add ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4067: [CARBONDATA-4100] Fix SI segments are in inconsistent state with maintable after concurrent(Load&Compaction) operation

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4067:
URL: https://github.com/apache/carbondata/pull/4067#discussion_r549729189



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##########
@@ -406,158 +407,195 @@ object CarbonIndexUtil {
       .asInstanceOf[CarbonRelation]
       .carbonTable
 
-    val siTblLoadMetadataDetails: Array[LoadMetadataDetails] =
-      SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath)
     var segmentLocks: ListBuffer[ICarbonLock] = ListBuffer.empty
-    if (!CarbonInternalLoaderUtil.checkMainTableSegEqualToSISeg(
-      mainTableDetails.toArray,
-      siTblLoadMetadataDetails)) {
-      val indexColumns = indexMetadata.getIndexColumns(secondaryIndexProvider,
-        indexTableName)
-      val indexModel = IndexModel(Some(carbonTable.getDatabaseName),
-        indexMetadata.getParentTableName,
-        indexColumns.split(",").toList,
-        indexTableName)
-
-      // var details = SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath)
-      // If it empty, then no need to do further computations because the
-      // tabletstatus might not have been created and hence next load will take care
-      if (siTblLoadMetadataDetails.isEmpty) {
-        Seq.empty
-      }
+    val compactionLock = CarbonLockFactory.getCarbonLockObj(
+      carbonTable.getAbsoluteTableIdentifier,
+      LockUsage.COMPACTION_LOCK)
+    try {
+      // In some cases, SI table segment might be in COMPACTED state and main table
+      // compaction might be still in progress. In those cases, we can try to take compaction lock
+      // on main table and then compare and add SI segments to failedLoads, to avoid repair
+      // SI SUCCESS loads.
+      if (compactionLock.lockWithRetries()) {
+        var mainTableDetails = try {
+          SegmentStatusManager.readTableStatusFile(CarbonTablePath.getTableStatusFilePath(
+            carbonTable.getTablePath))
+        } catch {
+          case _: Exception =>
+            if (!isLoadOrCompaction) {
+              throw new ConcurrentOperationException(carbonTable.getDatabaseName,
+                carbonTable.getTableName, "table status read", "reindex command")
+            }
+            return;

Review comment:
       No need




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4067: [CARBONDATA-4100] Fix SI segments are in inconsistent state with maintable after concurrent(Load&Compaction) operation

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4067:
URL: https://github.com/apache/carbondata/pull/4067#issuecomment-752106076


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5260/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #4067: [CARBONDATA-4100] Fix SI segments are in inconsistent state with maintable after concurrent(Load&Compaction) operation

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #4067:
URL: https://github.com/apache/carbondata/pull/4067#issuecomment-752123149


   LGTM.
   
   However `processSIRepair` (base code for this PR) can be optimized further in other PR. multiple time we are reading table status and comparing main table and SI table again again. can be done in one loop.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4067: [CARBONDATA-4100] Fix SI segments are in inconsistent state with maintable after concurrent(Load&Compaction) operation

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4067:
URL: https://github.com/apache/carbondata/pull/4067#issuecomment-752143730


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5261/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4067: [CARBONDATA-4100] Fix SI segments are in inconsistent state with maintable after concurrent(Load&Compaction) operation

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4067:
URL: https://github.com/apache/carbondata/pull/4067#issuecomment-752145672


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3500/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #4067: [CARBONDATA-4100] Fix SI segments are in inconsistent state with maintable after concurrent(Load&Compaction) operation

GitBox
In reply to this post by GitBox

asfgit closed pull request #4067:
URL: https://github.com/apache/carbondata/pull/4067


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


12