Posted by
GitBox on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/GitHub-carbondata-ShreelekhyaG-opened-a-new-pull-request-3988-WIP-Clean-index-files-when-clean-filesd-tp102150p107511.html
ShreelekhyaG commented on a change in pull request #3988:
URL:
https://github.com/apache/carbondata/pull/3988#discussion_r611627327##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -661,6 +687,20 @@ public boolean accept(CarbonFile file) {
}
});
if (listFiles != null && listFiles.length > 0) {
+ Set<String> mergedAndInvalidIndexFiles = getInvalidAndMergedIndexFiles(
+ Arrays.stream(listFiles).map(file -> file.getAbsolutePath())
+ .collect(Collectors.toList()));
+ // Delete index files that are merged.
+ for (CarbonFile indexFile : listFiles) {
+ if (mergedAndInvalidIndexFiles.contains(indexFile.getAbsolutePath())) {
+ indexFile.delete();
Review comment:
`getSegmentFileForPhysicalDataPartitions ` method is called for only Alter add hive partition flow and as it is not old store, it can be deleted. We are triggering `alterTableMergeIndexEvent` in this flow, the index files are not deleted immediately after merge index because `isOldStoreIndexFilesPresent` flag is set true in this event.
I didn't want to change the flag for Alter add hive partition flow as we may actually add old store data(store <= 1.1 version) and it has to read all blocklet info from the file footer of carbondata file.
##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -1341,13 +1375,13 @@ public static void removeTempFolder(Map<String, FolderDetails> locationMap, Stri
public static Set<String> getIndexFilesListForSegment(Segment segment, String tablePath)
Review comment:
Done
##########
File path: core/src/main/java/org/apache/carbondata/core/readcommitter/LatestFilesReadCommittedScope.java
##########
@@ -135,15 +139,24 @@ private void prepareLoadMetadata() {
} else {
segName = segment.getSegmentFileName();
}
- List<String> index = snapShot.get(segName);
- if (null == index) {
- index = new LinkedList<>();
+ List<String> indexFiles = snapShot.get(segName);
+ if (null == indexFiles) {
+ indexFiles = new LinkedList<>();
}
- for (String indexPath : index) {
- if (indexPath.endsWith(CarbonTablePath.MERGE_INDEX_FILE_EXT)) {
- indexFileStore.put(indexPath, indexPath.substring(indexPath.lastIndexOf('/') + 1));
+ Set<String> mergedIndexFiles =
+ SegmentFileStore.getInvalidAndMergedIndexFiles(indexFiles);
+ if (!mergedIndexFiles.isEmpty()) {
+ // do not include already merged indexFiles files details.
+ indexFiles = indexFiles.stream().filter(
Review comment:
Done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[hidden email]