suwen created CARBONDATA-3985:
---------------------------------
Summary: Optimize the segment-timestamp file clean up
Key: CARBONDATA-3985
URL:
https://issues.apache.org/jira/browse/CARBONDATA-3985 Project: CarbonData
Issue Type: Improvement
Components: core, spark-integration
Reporter: suwen
For data update, in the CarbonProjectForUpdateCommand process, after the delete delta file is generated, the status of each segment is checked. If the status is not successful, all the segment directories are traversed to clean up the timestamp corresponding .carbondata, .carbonindex and .deletedelta files.
If a great many segments have been generated in the Partion directory, it will be very time-consuming.
In fact, in the process of cleaning up timestamp files, we only need to clean up the files in the Segment directory involved in this update.
In the process of generating delete delta, record the segment path involved in this update; after entering the checkAndUpdateStatusFiles() function, if a segment status is found to be not successful, it will be cleaned directly according to the segment path list that has been recorded during generating delete delta, without searching all the segment directories.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)