[GitHub] [carbondata] ajantha-bhat opened a new pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat opened a new pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

GitBox

ajantha-bhat opened a new pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994


    ### Why is this PR needed?
    For compaction, we don't register in-progress segment. so, when unable to get a table status lock. compaction can fail. That time compaction partial segment needs to be cleaned. If the partial segment is failed to clean up due to unable to get lock or IO issues. When the user retries the compaction. carbon uses the same segment id. so while writing the segment file for new compaction. list only the files mapping to the current compaction, not all the files which contain stale files.
   
    ### What changes were proposed in this PR?
   While writing the segment file, consider index files belongs to the current load only in the segment folder.
       
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - No [As it happens in concurrent scenario randomly, manually verified]
   
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

GitBox

ajantha-bhat commented on pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#issuecomment-714445721


   @QiangCai : please check this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#issuecomment-714556030


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2876/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#issuecomment-714569205


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4632/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#issuecomment-714616941


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#issuecomment-714671495


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4640/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#issuecomment-714711458


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2884/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai commented on a change in pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

GitBox
In reply to this post by GitBox

QiangCai commented on a change in pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#discussion_r510548190



##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -398,27 +398,29 @@ public static void mergeIndexAndWriteSegmentFile(CarbonTable carbonTable, String
    * @throws IOException
    */
   public static String writeSegmentFile(CarbonTable carbonTable, String segmentId, String UUID,

Review comment:
       for the update, it will have more than one timstamp, right




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] QiangCai commented on a change in pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

GitBox
In reply to this post by GitBox

QiangCai commented on a change in pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#discussion_r510548190



##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -398,27 +398,29 @@ public static void mergeIndexAndWriteSegmentFile(CarbonTable carbonTable, String
    * @throws IOException
    */
   public static String writeSegmentFile(CarbonTable carbonTable, String segmentId, String UUID,

Review comment:
       for the update, it will have more than one timestamp, right

##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -398,27 +398,29 @@ public static void mergeIndexAndWriteSegmentFile(CarbonTable carbonTable, String
    * @throws IOException
    */
   public static String writeSegmentFile(CarbonTable carbonTable, String segmentId, String UUID,

Review comment:
       for the update, it will have more than one timestamp, right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#discussion_r510596257



##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -398,27 +398,29 @@ public static void mergeIndexAndWriteSegmentFile(CarbonTable carbonTable, String
    * @throws IOException
    */
   public static String writeSegmentFile(CarbonTable carbonTable, String segmentId, String UUID,

Review comment:
       Agree, Let me check and modify




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#discussion_r510671271



##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -398,27 +398,29 @@ public static void mergeIndexAndWriteSegmentFile(CarbonTable carbonTable, String
    * @throws IOException
    */
   public static String writeSegmentFile(CarbonTable carbonTable, String segmentId, String UUID,

Review comment:
       I have pushed now. waiting for the build




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#issuecomment-715171177


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2903/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#issuecomment-715172063


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4659/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#discussion_r510987950



##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -398,27 +398,29 @@ public static void mergeIndexAndWriteSegmentFile(CarbonTable carbonTable, String
    * @throws IOException
    */
   public static String writeSegmentFile(CarbonTable carbonTable, String segmentId, String UUID,

Review comment:
       @QiangCai : I have thought about, only non update scenario I can handle this issue. For update there is no easy way currently to find out which is stale and which is not. One way for update is to read old segment file and add files that timestamp greater than old segment file content + old segment file content.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

GitBox
In reply to this post by GitBox

marchpure commented on pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#issuecomment-716933491


   Hi, Ajantha. I have fixed this in #3999 , please have a check.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#issuecomment-716998184


    > Hi, Ajantha. I have fixed this in #3999 , please have a check.
   
   @marchpure: I think your PR is not handling all the scenarios, so update scenario and test case will fail (even mine also)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

GitBox
In reply to this post by GitBox

marchpure commented on pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#issuecomment-717010368


   > > Hi, Ajantha. I have fixed this in #3999 , please have a check.
   >
   > @marchpure: I think your PR is not handling all the scenarios, so update scenario and test case will fail (even mine also)
   
   yes.  update test case all passes
   i am handing other 10 test failures


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#issuecomment-717014461


   @marchpure : If you dont assign UUID for all update case or compaction after update case, all testcases will pass. But the original data mismatch will still happen for segment that has stale files and went for update and compaction (that scenario you can test)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

GitBox
In reply to this post by GitBox

marchpure commented on pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#issuecomment-717015851


   > @marchpure : If you dont assign UUID for all update case or compaction after update case, all testcases will pass. But the original data mismatch will still happen for segment that has stale files and went for update and compaction (that scenario you can test)
   
   yes.  I have test it. update will generate new segment like merge into. which can void recreate segment file.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure edited a comment on pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

GitBox
In reply to this post by GitBox

marchpure edited a comment on pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#issuecomment-717015851


   > @marchpure : If you dont assign UUID for all update case or compaction after update case, all testcases will pass. But the original data mismatch will still happen for segment that has stale files and went for update and compaction (that scenario you can test)
   
   yes.  I have test it. in pr3999, update will generate new segment like merge into. which can void recreate segment file.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


12