[GitHub] [carbondata] su-article opened a new pull request #3925: [CARBONDATA-3985] Optimize the segment-timestamp file clean up

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] su-article opened a new pull request #3925: [CARBONDATA-3985] Optimize the segment-timestamp file clean up

GitBox

su-article opened a new pull request #3925:
URL: https://github.com/apache/carbondata/pull/3925


    ### Why is this PR needed?
    For data update, in the CarbonProjectForUpdateCommand process, after the delete delta file is generated, the status of each segment is checked. If the status is not successful, all the segment directories are traversed to clean up the timestamp corresponding .carbondata, .carbonindex and .deletedelta files.
   If a great many segments have been generated in the Partion directory, it will be very time-consuming.
   
    ### What changes were proposed in this PR?
   In the process of generating delete delta, record the segment path involved in this update; after entering the checkAndUpdateStatusFiles() function, if a segment status is found to be not successful, it will be cleaned directly according to the segment path list that has been recorded during generating delete delta, without searching all the segment directories.
       
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - No
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3925: [CARBONDATA-3985] Optimize the segment-timestamp file clean up

GitBox

CarbonDataQA1 commented on pull request #3925:
URL: https://github.com/apache/carbondata/pull/3925#issuecomment-691857427


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #3925: [CARBONDATA-3985] Optimize the segment-timestamp file clean up

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #3925:
URL: https://github.com/apache/carbondata/pull/3925#issuecomment-691863991






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on pull request #3925: [CARBONDATA-3985] Optimize the segment-timestamp file clean up

GitBox
In reply to this post by GitBox

akashrn5 commented on pull request #3925:
URL: https://github.com/apache/carbondata/pull/3925#issuecomment-691904597


   add to whitelist


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on pull request #3925: [CARBONDATA-3985] Optimize the segment-timestamp file clean up

GitBox
In reply to this post by GitBox

akashrn5 commented on pull request #3925:
URL: https://github.com/apache/carbondata/pull/3925#issuecomment-691913393


   @su-article as i can see from code, the delete status which is `deleteStatus`  is updated in `deleteDeltaFunc` function and inside the executor, and we are returning the result in tuple.
   **Observations:**
   1. In `deleteDeltaFunc` function, the `deleteStatus` is updated to success, if everything passes else it will be as failure. If any exception occurs, we catch and throw from the executor, which is complete delete operation failure.
   2. Once the exeception is thrown from the function, its direct caught in the driver side at `CarbonProjectForUpdateCommand` or `CarbonProjectForDeleteCommand` catch block.
   3. The code never hits the `checkAndUpdateStatusFiles` in failure case of delete, as you have mentioned in description.
   4. The `CarbonUpdateUtil.cleanStaleDeltaFiles` is called from the catch block of update and delete command classes, where we pass the `currentTimestamp ` which is the `factTimeStamp `for update and delete.
   
   So, from my analysis, i think you are handling this in place of dead codes and it wont improve anything.
   
   can you please check the above points from your testing and code review? Correct me if im wrong in the analysis.
   
   You may need to handle in `cleanStaleDeltaFiles` call from both the command classes, instead of the existing change. Please check and reply.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 edited a comment on pull request #3925: [CARBONDATA-3985] Optimize the segment-timestamp file clean up

GitBox
In reply to this post by GitBox

akashrn5 edited a comment on pull request #3925:
URL: https://github.com/apache/carbondata/pull/3925#issuecomment-691913393


   @su-article as i can see from code, the delete status which is `deleteStatus`  is updated in `deleteDeltaFunc` function and inside the executor, and we are returning the result in tuple.
   
   **Observations:**
   1. In `deleteDeltaFunc` function, the `deleteStatus` is updated to success, if everything passes, or else it will be as failure(default value). If any exception occurs, we catch and throw from the executor, which is complete delete operation failure.
   2. Once the exception is thrown from the function, it's directly caught on the driver side at `CarbonProjectForUpdateCommand` or `CarbonProjectForDeleteCommand` catch blocks.
   3. The code never hits the `checkAndUpdateStatusFiles` in failure case of delete, as you have mentioned in description.
   4. The `CarbonUpdateUtil.cleanStaleDeltaFiles` is called from the catch block of update and delete command classes, where we pass the `currentTimestamp ` which is the `factTimeStamp `for update and delete to delete the stale files in case of any failure.
   
   So, from my analysis, i think you are handling this in place of dead codes and it wont improve anything.
   
   can you please check the above points from your testing and code review? Correct me if i'm wrong in the analysis.
   
   You may need to handle in `cleanStaleDeltaFiles` call from both the command classes, instead of the existing change. Please check and reply.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on pull request #3925: [CARBONDATA-3985] Optimize the segment-timestamp file clean up

GitBox
In reply to this post by GitBox

akashrn5 commented on pull request #3925:
URL: https://github.com/apache/carbondata/pull/3925#issuecomment-691915235


   @marchpure can you please check my comment and give your opinion?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3925: [CARBONDATA-3985] Optimize the segment-timestamp file clean up

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3925:
URL: https://github.com/apache/carbondata/pull/3925#issuecomment-691950938


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4062/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3925: [CARBONDATA-3985] Optimize the segment-timestamp file clean up

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3925:
URL: https://github.com/apache/carbondata/pull/3925#issuecomment-691952268


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2323/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on pull request #3925: [CARBONDATA-3985] Optimize the segment-timestamp file clean up

GitBox
In reply to this post by GitBox

marchpure commented on pull request #3925:
URL: https://github.com/apache/carbondata/pull/3925#issuecomment-692439483


   Actually, we shall not do the clean work during UPDATE/DELETE. I think we can directly remove this code to avoid any possiblity of data lose.
   
   
   > @marchpure can you please check my comment and give your opinion?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on pull request #3925: [CARBONDATA-3985] Optimize the segment-timestamp file clean up

GitBox
In reply to this post by GitBox

akashrn5 commented on pull request #3925:
URL: https://github.com/apache/carbondata/pull/3925#issuecomment-692456538


   > Actually, we shall not do the clean work during UPDATE/DELETE. I think we can directly remove this code to avoid any possiblity of data lose.
   >
   > > @marchpure can you please check my comment and give your opinion?
   
   @marchpure i agree that, it will create some serious problems in some corner cases. But if we remove the cleaning code now, there can issues of data mismatch or wrong data, which is also a serious one. So basically we need to decide how to delete correctly and at what time. For this already @vikramahuja1001 have raised a discussion, can you please check and give inputs there?. You can check here http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Clean-files-enhancement-tt100088.html


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3925: [CARBONDATA-3985] Optimize the segment-timestamp file clean up

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3925:
URL: https://github.com/apache/carbondata/pull/3925#issuecomment-693152789


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4089/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3925: [CARBONDATA-3985] Optimize the segment-timestamp file clean up

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3925:
URL: https://github.com/apache/carbondata/pull/3925#issuecomment-693154382


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2349/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]