Posted by
GitBox on
Sep 18, 2020; 7:39am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/GitHub-carbondata-Pickupolddriver-opened-a-new-pull-request-3935-CARBONDATA-3993-Remove-deletePartias-tp100496p100578.html
ajantha-bhat commented on a change in pull request #3935:
URL:
https://github.com/apache/carbondata/pull/3935#discussion_r490760218##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##########
@@ -267,9 +266,8 @@ object CarbonDataRDDFactory {
throw new Exception("Exception in compaction " + exception.getMessage)
}
} finally {
- executor.shutdownNow()
try {
- compactor.deletePartialLoadsInCompaction()
Review comment:
a) When compaction retries, it uses the same segment ID, if stale files are not cleaned. It gives duplicate data.
So, before this change, we need #3934 to be merged which can use a unique segment id for compaction retry.
b) please check and move the logic of deletePartialLoadsInCompaction in clean files command, instead of permanently removing it. If the clean files don't have this logic, it may not able to clean stale files.
c) Also if the purpose of this PR is to avoid accidental data loss. you need to handle `cleanStaleDeltaFiles` in `CarbonUpdateUtil.java` and also identify other places. Just handling in few place will not guarantee that we cannot have data loss.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[hidden email]