[GitHub] [carbondata] maheshrajus opened a new pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] maheshrajus opened a new pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

GitBox

maheshrajus opened a new pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811


   segment mismatch between maintable and SI table when load with concurrency
   
    ### Why is this PR needed?
    segment mismatch between maintable and SI table when load with concurrency
   
    ### What changes were proposed in this PR?
   segment mismatch between maintable and SI table when load with concurrency
   Added control check for SI enable flag of SI tables in concurrent scenarios.
       
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - No
   
   https://issues.apache.org/jira/browse/CARBONDATA-3874   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

GitBox

CarbonDataQA1 commented on pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#issuecomment-650333439


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1500/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#issuecomment-650334290


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3228/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#discussion_r448251601



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/command/SICreationCommand.scala
##########
@@ -390,8 +391,15 @@ private[sql] case class CarbonCreateSecondaryIndexCommand(
       }
       val indexTablePath = CarbonTablePath
         .getMetadataPath(tableInfo.getOrCreateAbsoluteTableIdentifier.getTablePath)
+      val metaPathMainTbl: String = carbonTable.getMetadataPath

Review comment:
       If metaPathMainTbl and  metaPathSI variable used in one place only, can replace with  `SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)`

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/SILoadEventListenerForFailedSegments.scala
##########
@@ -78,20 +78,29 @@ class SILoadEventListenerForFailedSegments extends OperationEventListener with L
                   .getTableMetadata(TableIdentifier(indexTableName,
                     Some(carbonLoadModel.getDatabaseName))).storage.properties
                   .getOrElse("isSITableEnabled", "true").toBoolean
+                val indexTable = metaStore
+                  .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)(
+                    sparkSession)
+                  .asInstanceOf[CarbonRelation]
+                  .carbonTable
 
-                if (!isLoadSIForFailedSegments) {
+                val metaPathMainTbl: String = carbonTable.getMetadataPath
+                val listOfLoadFolderDetailsArrayMainTbl: Array[LoadMetadataDetails] =

Review comment:
       Handle Same as above comment




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#discussion_r448252248



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/SILoadEventListenerForFailedSegments.scala
##########
@@ -78,20 +78,29 @@ class SILoadEventListenerForFailedSegments extends OperationEventListener with L
                   .getTableMetadata(TableIdentifier(indexTableName,
                     Some(carbonLoadModel.getDatabaseName))).storage.properties
                   .getOrElse("isSITableEnabled", "true").toBoolean
+                val indexTable = metaStore
+                  .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)(
+                    sparkSession)
+                  .asInstanceOf[CarbonRelation]
+                  .carbonTable
 
-                if (!isLoadSIForFailedSegments) {
+                val metaPathMainTbl: String = carbonTable.getMetadataPath
+                val listOfLoadFolderDetailsArrayMainTbl: Array[LoadMetadataDetails] =

Review comment:
       Better, can move getting load meta details of main table and SI to a method in a scala class like CarbonIndexUtil and reuse




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] maheshrajus commented on a change in pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

GitBox
In reply to this post by GitBox

maheshrajus commented on a change in pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#discussion_r450329884



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/command/SICreationCommand.scala
##########
@@ -390,8 +391,15 @@ private[sql] case class CarbonCreateSecondaryIndexCommand(
       }
       val indexTablePath = CarbonTablePath
         .getMetadataPath(tableInfo.getOrCreateAbsoluteTableIdentifier.getTablePath)
+      val metaPathMainTbl: String = carbonTable.getMetadataPath

Review comment:
       OK, fixed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] maheshrajus commented on a change in pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

GitBox
In reply to this post by GitBox

maheshrajus commented on a change in pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#discussion_r450330869



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/SILoadEventListenerForFailedSegments.scala
##########
@@ -78,20 +78,29 @@ class SILoadEventListenerForFailedSegments extends OperationEventListener with L
                   .getTableMetadata(TableIdentifier(indexTableName,
                     Some(carbonLoadModel.getDatabaseName))).storage.properties
                   .getOrElse("isSITableEnabled", "true").toBoolean
+                val indexTable = metaStore
+                  .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)(
+                    sparkSession)
+                  .asInstanceOf[CarbonRelation]
+                  .carbonTable
 
-                if (!isLoadSIForFailedSegments) {
+                val metaPathMainTbl: String = carbonTable.getMetadataPath
+                val listOfLoadFolderDetailsArrayMainTbl: Array[LoadMetadataDetails] =

Review comment:
       @Indhumathi27  Here we are preparing list of metadata path of main table and si table. So adding a new method for just getting the metadata path/list of metadatapaths is no use in every case.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#discussion_r450363174



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/command/SICreationCommand.scala
##########
@@ -390,8 +391,13 @@ private[sql] case class CarbonCreateSecondaryIndexCommand(
       }
       val indexTablePath = CarbonTablePath
         .getMetadataPath(tableInfo.getOrCreateAbsoluteTableIdentifier.getTablePath)
+      val listOfLoadFolderDetailsArrayMainTbl: Array[LoadMetadataDetails] =

Review comment:
       `listOfLoadFolderDetailsArrayMainTbl` variable name is big and not clear, can give a meaningful name like `mainTblLoadMetadataDetails`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#discussion_r450363325



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/command/SICreationCommand.scala
##########
@@ -390,8 +391,13 @@ private[sql] case class CarbonCreateSecondaryIndexCommand(
       }
       val indexTablePath = CarbonTablePath
         .getMetadataPath(tableInfo.getOrCreateAbsoluteTableIdentifier.getTablePath)
+      val listOfLoadFolderDetailsArrayMainTbl: Array[LoadMetadataDetails] =
+        SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)
+      val listOfLoadFolderDetailsArraySI: Array[LoadMetadataDetails] =

Review comment:
       similar to above




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#discussion_r450364155



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/SILoadEventListenerForFailedSegments.scala
##########
@@ -78,20 +78,27 @@ class SILoadEventListenerForFailedSegments extends OperationEventListener with L
                   .getTableMetadata(TableIdentifier(indexTableName,
                     Some(carbonLoadModel.getDatabaseName))).storage.properties
                   .getOrElse("isSITableEnabled", "true").toBoolean
+                val indexTable = metaStore
+                  .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)(
+                    sparkSession)
+                  .asInstanceOf[CarbonRelation]
+                  .carbonTable
 
-                if (!isLoadSIForFailedSegments) {
+                val listOfLoadFolderDetailsArrayMainTbl: Array[LoadMetadataDetails] =

Review comment:
       `listOfLoadFolderDetailsArrayMainTbl`  same as above comment




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#discussion_r450365209



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/load/CarbonInternalLoaderUtil.java
##########
@@ -304,14 +304,16 @@ public static boolean updateLoadMetadataWithMergeStatus(CarbonTable indexCarbonT
 
   /**
    * Method to check if main table and SI have same number of valid segments or not
-   *
+   * @param mainTableLoadMetadataDetails

Review comment:
       no need of this, just method explanation is enough, as variable name suggest what info is it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#issuecomment-654402097


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1572/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#issuecomment-654402829


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3310/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#issuecomment-654730812


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3312/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#issuecomment-654732910


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1574/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

GitBox
In reply to this post by GitBox

akashrn5 commented on pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#issuecomment-654776383


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

GitBox
In reply to this post by GitBox

akashrn5 commented on pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#issuecomment-654787100


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#issuecomment-654872025


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3315/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#issuecomment-654872523


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1577/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

GitBox
In reply to this post by GitBox

akashrn5 commented on pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#issuecomment-655339773


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


12