[GitHub] [carbondata] marchpure opened a new pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

classic Classic list List threaded Threaded
67 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure opened a new pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

GitBox

marchpure opened a new pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977


   …rt stage
   
    ### Why is this PR needed?
    In the insertstage flow, there is a empty file with suffix '.loading' to mark the stage in the status of 'in processing'. We update the modifiedtime of '.loading' file for monitoring the insertstage start time, which can be used for calculate TIMEOUT, help to retry and recovery.
   Before, we use setModifiedTime function to update the modifiedtime, which has a serious bug.
   For S3 file, setModifiedTime operation do not take effect. leading to the incorrect inserstage starttime of 'loading' file.
   
    ### What changes were proposed in this PR?
   Update the modifiedtime of loading files based on recreating files.
       
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - No
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

GitBox

marchpure commented on pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#issuecomment-706815843


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#issuecomment-706844880


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2610/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#issuecomment-706845337


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4360/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

GitBox
In reply to this post by GitBox

marchpure commented on pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#issuecomment-707468141


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

GitBox
In reply to this post by GitBox

Indhumathi27 commented on pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#issuecomment-707488860


   @marchpure If setLastModifiedTime operation do not take effect on S3, in other places also, we need to check and update


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#discussion_r503670575



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertFromStageCommand.scala
##########
@@ -533,14 +533,13 @@ case class CarbonInsertFromStageCommand(
             val stageLoadingFile =
               FileFactory.getCarbonFile(stagePath +
                 File.separator + files._1.getName + CarbonTablePath.LOADING_FILE_SUFFIX);
-            // Try to create loading files
-            // make isFailed to be true if createNewFile return false.
-            // the reason can be file exists or exceptions.
-            var isFailed = !stageLoadingFile.createNewFile()
-            // if file exists, modify the lastmodifiedtime of the file.
-            if (isFailed) {
-              // make isFailed to be true if setLastModifiedTime return false.
-              isFailed = !stageLoadingFile.setLastModifiedTime(System.currentTimeMillis());
+            // Try to recreate loading files if the loading file exists
+            // or create loading files directly if the loading file doesn't exist
+            // set isFailed to be false when (delete and) createfile success
+            var isFailed = if (stageLoadingFile.exists()) {

Review comment:
       isFailed will always be false. can remove it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#discussion_r503670575



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertFromStageCommand.scala
##########
@@ -533,14 +533,13 @@ case class CarbonInsertFromStageCommand(
             val stageLoadingFile =
               FileFactory.getCarbonFile(stagePath +
                 File.separator + files._1.getName + CarbonTablePath.LOADING_FILE_SUFFIX);
-            // Try to create loading files
-            // make isFailed to be true if createNewFile return false.
-            // the reason can be file exists or exceptions.
-            var isFailed = !stageLoadingFile.createNewFile()
-            // if file exists, modify the lastmodifiedtime of the file.
-            if (isFailed) {
-              // make isFailed to be true if setLastModifiedTime return false.
-              isFailed = !stageLoadingFile.setLastModifiedTime(System.currentTimeMillis());
+            // Try to recreate loading files if the loading file exists
+            // or create loading files directly if the loading file doesn't exist
+            // set isFailed to be false when (delete and) createfile success
+            var isFailed = if (stageLoadingFile.exists()) {

Review comment:
       isFailed will always be false. can remove it and update the comment




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#issuecomment-707502953


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4394/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#issuecomment-707504154


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2642/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#issuecomment-707651775


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4398/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#issuecomment-707653129


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2647/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on a change in pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

GitBox
In reply to this post by GitBox

marchpure commented on a change in pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#discussion_r503776598



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertFromStageCommand.scala
##########
@@ -533,14 +533,13 @@ case class CarbonInsertFromStageCommand(
             val stageLoadingFile =
               FileFactory.getCarbonFile(stagePath +
                 File.separator + files._1.getName + CarbonTablePath.LOADING_FILE_SUFFIX);
-            // Try to create loading files
-            // make isFailed to be true if createNewFile return false.
-            // the reason can be file exists or exceptions.
-            var isFailed = !stageLoadingFile.createNewFile()
-            // if file exists, modify the lastmodifiedtime of the file.
-            if (isFailed) {
-              // make isFailed to be true if setLastModifiedTime return false.
-              isFailed = !stageLoadingFile.setLastModifiedTime(System.currentTimeMillis());
+            // Try to recreate loading files if the loading file exists
+            // or create loading files directly if the loading file doesn't exist
+            // set isFailed to be false when (delete and) createfile success
+            var isFailed = if (stageLoadingFile.exists()) {

Review comment:
       if IOException happend in stageLoadingFile.createNewFile(), createNewFile() will return FALE, make isFailed to be TRUE




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

GitBox
In reply to this post by GitBox

marchpure commented on pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#issuecomment-707716391


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#discussion_r503947136



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertFromStageCommand.scala
##########
@@ -533,14 +533,13 @@ case class CarbonInsertFromStageCommand(
             val stageLoadingFile =
               FileFactory.getCarbonFile(stagePath +
                 File.separator + files._1.getName + CarbonTablePath.LOADING_FILE_SUFFIX);
-            // Try to create loading files
-            // make isFailed to be true if createNewFile return false.
-            // the reason can be file exists or exceptions.
-            var isFailed = !stageLoadingFile.createNewFile()
-            // if file exists, modify the lastmodifiedtime of the file.
-            if (isFailed) {
-              // make isFailed to be true if setLastModifiedTime return false.
-              isFailed = !stageLoadingFile.setLastModifiedTime(System.currentTimeMillis());
+            // Try to recreate loading files if the loading file exists
+            // or create loading files directly if the loading file doesn't exist
+            // set isFailed to be false when (delete and) createfile success
+            var isFailed = if (stageLoadingFile.exists()) {

Review comment:
       ```suggestion
               val isFailed = if (stageLoadingFile.exists()) {
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#discussion_r503982610



##########
File path: core/src/main/java/org/apache/carbondata/core/view/MVProvider.java
##########
@@ -545,14 +545,14 @@ private void touchMDTFile() throws IOException {
         FileFactory.createDirectoryAndSetPermission(this.systemDirectory,
             new FsPermission(FsAction.ALL, FsAction.ALL, FsAction.ALL));
       }
-      if (!FileFactory.isFileExist(this.schemaIndexFilePath)) {
-        FileFactory.createNewFile(
-            this.schemaIndexFilePath,
-            new FsPermission(FsAction.ALL, FsAction.ALL, FsAction.ALL));
+      if (FileFactory.isFileExist(this.schemaIndexFilePath)) {

Review comment:
       ```suggestion
         CarbonFile schemaIndexFile = FileFactory.getCarbonFile(this.schemaIndexFilePath);
         if (schemaIndexFile.exists()) {
           schemaIndexFile.delete();
         }
         schemaIndexFile.createNewFile(new FsPermission(FsAction.ALL, FsAction.ALL, FsAction.ALL));
         this.lastModifiedTime = schemaIndexFile.getLastModifiedTime();
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#issuecomment-707786164


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4405/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#issuecomment-707786599


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2652/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#issuecomment-707886571






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#issuecomment-707962385


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4413/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


1234