[GitHub] [carbondata] vikramahuja1001 opened a new pull request #4059: [WIP] Fix concurrent issues in delete segment API's

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 opened a new pull request #4059: [WIP] Fix concurrent issues in delete segment API's

GitBox

vikramahuja1001 opened a new pull request #4059:
URL: https://github.com/apache/carbondata/pull/4059


    ### Why is this PR needed?
   
   
    ### What changes were proposed in this PR?
   
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4059: [WIP] Fix concurrent issues in delete segment API's

GitBox

CarbonDataQA2 commented on pull request #4059:
URL: https://github.com/apache/carbondata/pull/4059#issuecomment-748041835


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3449/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4059: [WIP] Fix concurrent issues in delete segment API's

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4059:
URL: https://github.com/apache/carbondata/pull/4059#issuecomment-748042199


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5209/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4059: [WIP] Fix concurrent issues in delete segment API's

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4059:
URL: https://github.com/apache/carbondata/pull/4059#issuecomment-748086703


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3451/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4059: [WIP] Fix concurrent issues in delete segment API's

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4059:
URL: https://github.com/apache/carbondata/pull/4059#issuecomment-748087168


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5211/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on pull request #4059: [CARBONDATA-4092] Fix concurrent issues in delete segment API's

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on pull request #4059:
URL: https://github.com/apache/carbondata/pull/4059#issuecomment-748418100


   @ajantha-bhat @akashrn5 , please review


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4059: [CARBONDATA-4092] Fix concurrent issues in delete segment API's and MV flow

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4059:
URL: https://github.com/apache/carbondata/pull/4059#discussion_r546530685



##########
File path: core/src/main/java/org/apache/carbondata/core/view/MVProvider.java
##########
@@ -540,7 +540,7 @@ private void checkAndReloadSchemas(MVManager viewManager, boolean touchFile)
       }
     }
 
-    private void touchMDTFile() throws IOException {
+    private synchronized void touchMDTFile() throws IOException {

Review comment:
       In saveSchema() method of same file,
   checkAndReloadSchemas(viewManager, true);
           touchMDTFile();
   
   line 432: after passing touch = true, again we are calling touchMDTFile, is itrequired?

##########
File path: core/src/main/java/org/apache/carbondata/core/view/MVProvider.java
##########
@@ -540,7 +540,7 @@ private void checkAndReloadSchemas(MVManager viewManager, boolean touchFile)
       }
     }
 
-    private void touchMDTFile() throws IOException {
+    private synchronized void touchMDTFile() throws IOException {

Review comment:
       @Indhumathi27 , @vikramahuja1001




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4059: [CARBONDATA-4092] Fix concurrent issues in delete segment API's and MV flow

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #4059:
URL: https://github.com/apache/carbondata/pull/4059#discussion_r546534354



##########
File path: core/src/main/java/org/apache/carbondata/core/view/MVProvider.java
##########
@@ -540,7 +540,7 @@ private void checkAndReloadSchemas(MVManager viewManager, boolean touchFile)
       }
     }
 
-    private void touchMDTFile() throws IOException {
+    private synchronized void touchMDTFile() throws IOException {

Review comment:
       @ajantha-bhat yes, it is required. On second MV creation, have to update last modified time of schema INdex file




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4059: [CARBONDATA-4092] Fix concurrent issues in delete segment API's and MV flow

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4059:
URL: https://github.com/apache/carbondata/pull/4059#discussion_r546537345



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##########
@@ -445,38 +445,28 @@ private static Integer compareDateValues(Long loadValue, Long userValue) {
           LOG.error("Load metadata file is not present.");
           return loadIds;
         }
-        // read existing metadata details in load metadata.
-        listOfLoadFolderDetailsArray = readLoadMetadata(tableFolderPath);
-        if (listOfLoadFolderDetailsArray.length != 0) {
-          updateDeletionStatus(identifier, loadIds, listOfLoadFolderDetailsArray, invalidLoadIds);
-          if (invalidLoadIds.isEmpty()) {
-            // All or None , if anything fails then don't write
-            if (carbonTableStatusLock.lockWithRetries()) {
-              LOG.info("Table status lock has been successfully acquired");
-              // To handle concurrency scenarios, always take latest metadata before writing
-              // into status file.
-              LoadMetadataDetails[] latestLoadMetadataDetails = readLoadMetadata(tableFolderPath);
-              updateLatestTableStatusDetails(listOfLoadFolderDetailsArray,
-                  latestLoadMetadataDetails);
+        if (carbonTableStatusLock.lockWithRetries()) {
+          LOG.info("Table status lock has been successfully acquired.");
+          listOfLoadFolderDetailsArray = readLoadMetadata(tableFolderPath);
+          if (listOfLoadFolderDetailsArray.length != 0) {
+            updateDeletionStatus(identifier, loadIds, listOfLoadFolderDetailsArray, invalidLoadIds);

Review comment:
       If the table status has 200K segments, finding and marking the matching segments to delete will take lot of time. Holding table status lock for quite a long time will impact concurrent query running. That is why previously they didn't had this logic in the lock.
   
   If some user reports issue because of this PR change, we have to optimize this logic may be.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4059: [CARBONDATA-4092] Fix concurrent issues in delete segment API's and MV flow

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4059:
URL: https://github.com/apache/carbondata/pull/4059#discussion_r546538696



##########
File path: core/src/main/java/org/apache/carbondata/core/view/MVProvider.java
##########
@@ -540,7 +540,7 @@ private void checkAndReloadSchemas(MVManager viewManager, boolean touchFile)
       }
     }
 
-    private void touchMDTFile() throws IOException {
+    private synchronized void touchMDTFile() throws IOException {

Review comment:
       checkAndReloadSchemas, better to return a flag whether it has touched MDT or not, if false then only we can call touchMDT




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4059: [CARBONDATA-4092] Fix concurrent issues in delete segment API's and MV flow

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4059:
URL: https://github.com/apache/carbondata/pull/4059#discussion_r546538696



##########
File path: core/src/main/java/org/apache/carbondata/core/view/MVProvider.java
##########
@@ -540,7 +540,7 @@ private void checkAndReloadSchemas(MVManager viewManager, boolean touchFile)
       }
     }
 
-    private void touchMDTFile() throws IOException {
+    private synchronized void touchMDTFile() throws IOException {

Review comment:
       checkAndReloadSchemas, better to return a flag whether it has touched MDT or not, if false then only we can call touchMDT, This way we can avoid calling twice for the first MV itself




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4059: [CARBONDATA-4092] Fix concurrent issues in delete segment API's and MV flow

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #4059:
URL: https://github.com/apache/carbondata/pull/4059#discussion_r546566836



##########
File path: core/src/main/java/org/apache/carbondata/core/view/MVProvider.java
##########
@@ -540,7 +540,7 @@ private void checkAndReloadSchemas(MVManager viewManager, boolean touchFile)
       }
     }
 
-    private void touchMDTFile() throws IOException {
+    private synchronized void touchMDTFile() throws IOException {

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4059: [CARBONDATA-4092] Fix concurrent issues in delete segment API's and MV flow

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #4059:
URL: https://github.com/apache/carbondata/pull/4059#discussion_r546568087



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##########
@@ -445,38 +445,28 @@ private static Integer compareDateValues(Long loadValue, Long userValue) {
           LOG.error("Load metadata file is not present.");
           return loadIds;
         }
-        // read existing metadata details in load metadata.
-        listOfLoadFolderDetailsArray = readLoadMetadata(tableFolderPath);
-        if (listOfLoadFolderDetailsArray.length != 0) {
-          updateDeletionStatus(identifier, loadIds, listOfLoadFolderDetailsArray, invalidLoadIds);
-          if (invalidLoadIds.isEmpty()) {
-            // All or None , if anything fails then don't write
-            if (carbonTableStatusLock.lockWithRetries()) {
-              LOG.info("Table status lock has been successfully acquired");
-              // To handle concurrency scenarios, always take latest metadata before writing
-              // into status file.
-              LoadMetadataDetails[] latestLoadMetadataDetails = readLoadMetadata(tableFolderPath);
-              updateLatestTableStatusDetails(listOfLoadFolderDetailsArray,
-                  latestLoadMetadataDetails);
+        if (carbonTableStatusLock.lockWithRetries()) {
+          LOG.info("Table status lock has been successfully acquired.");
+          listOfLoadFolderDetailsArray = readLoadMetadata(tableFolderPath);
+          if (listOfLoadFolderDetailsArray.length != 0) {
+            updateDeletionStatus(identifier, loadIds, listOfLoadFolderDetailsArray, invalidLoadIds);

Review comment:
       changed logic




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4059: [CARBONDATA-4092] Fix concurrent issues in delete segment API's and MV flow

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4059:
URL: https://github.com/apache/carbondata/pull/4059#issuecomment-748899932






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #4059: [CARBONDATA-4092] Fix concurrent issues in delete segment API's and MV flow

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #4059:
URL: https://github.com/apache/carbondata/pull/4059#issuecomment-748905534


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #4059: [CARBONDATA-4092] Fix concurrent issues in delete segment API's and MV flow

GitBox
In reply to this post by GitBox

asfgit closed pull request #4059:
URL: https://github.com/apache/carbondata/pull/4059


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4059: [CARBONDATA-4092] Fix concurrent issues in delete segment API's and MV flow

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4059:
URL: https://github.com/apache/carbondata/pull/4059#issuecomment-748922932


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5224/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4059: [CARBONDATA-4092] Fix concurrent issues in delete segment API's and MV flow

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4059:
URL: https://github.com/apache/carbondata/pull/4059#issuecomment-748923714


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3463/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]