Posted by
GitBox on
Dec 06, 2020; 6:10pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/GitHub-carbondata-QiangCai-opened-a-new-pull-request-4044-CARBONDATA-4062-Refactor-clean-files-featue-tp104338.html
QiangCai opened a new pull request #4044:
URL:
https://github.com/apache/carbondata/pull/4044 ### Why is this PR needed?
To prevent accidental deletion of data, carbon will introduce trash data management. It will provide buffer time for accidental deletion of data to roll back the delete operation.
Trash data management is a part of carbon data lifecycle management. Clean files as a data trash manager should contain the following two parts.
part 1: manage metadata-indexed data trash.
This data is at the original place of the table and indexed by metadata. carbon manages this data by metadata index and should avoid using listFile() interface.
part 2: manage ".Trash" folder.
Now ".Trash" folder is without metadata index, and the operation on it bases on timestamp and listFile() interface. In the future, carbon will index ".Trash" folder to improve data trash management.
### What changes were proposed in this PR?
remove data clean function from all features, but keep exception-handling part
Notes: the following features still clean data
a) drop table/database/partition/index/mv
b) insert/load overwrite table/partition
only clean files function works as a data trash manager now
support concurrent operation with other feature(loading, compaction, update/delete, and so on)
### Does this PR introduce any user interface change?
- Yes. (please explain the change and update document)
### Is any new testcase added?
- No
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[hidden email]