[jira] [Created] (CARBONDATA-3985) Optimize the segment-timestamp file clean up

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (CARBONDATA-3985) Optimize the segment-timestamp file clean up

Akash R Nilugal (Jira)
suwen created CARBONDATA-3985:
---------------------------------

             Summary: Optimize the segment-timestamp file clean up
                 Key: CARBONDATA-3985
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3985
             Project: CarbonData
          Issue Type: Improvement
          Components: core, spark-integration
            Reporter: suwen


For data update, in the CarbonProjectForUpdateCommand process, after the delete delta file is generated, the status of each segment is checked. If the status is not successful, all the segment directories are traversed to clean up the timestamp corresponding .carbondata, .carbonindex and .deletedelta files.

If a great many segments have been generated in the Partion directory, it will be very time-consuming.

In fact, in the process of cleaning up timestamp files, we only need to clean up the files in the Segment directory involved in this update.

In the process of generating delete delta, record the segment path involved in this update; after entering the checkAndUpdateStatusFiles() function, if a segment status is found to be not successful, it will be cleaned directly according to the segment path list that has been recorded during generating delete delta, without searching all the segment directories.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)