Posted by
GitBox on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/GitHub-carbondata-ShreelekhyaG-opened-a-new-pull-request-3988-WIP-Clean-index-files-when-clean-filesd-tp102150p107560.html
akashrn5 commented on a change in pull request #3988:
URL:
https://github.com/apache/carbondata/pull/3988#discussion_r612948684##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -419,56 +425,66 @@ public boolean accept(CarbonFile file) {
}
/**
- * Get old and invalid files which have already been merged to a mergeindex file.In segment folder
- * we may have both .index files and .mergeindex files, as we are not deleting index files
- * immediately for old tables, this method reads mergeindex file and adds mapped index files to a
- * list and returns.If more than one mergeindex file is present, considers the latest one as valid
- * Ex: We have 3 files in segment. Segment0/ 1.index , 1.mergeindex file, 1.carbondata.
- * 1.index is merged to 1.mergeindex. Here it returns merged index file - 1.index.
+ * Get stale and invalid index files that have already been merged.
+ * As we are not deleting index files immediately after updating old tables,
+ * we will have old index files in segment folder.
+ * This method is called in following scenarios:
+ * 1. During read, when segment file is not present or gets deleted.
+ * 2. When writing segment index size in tablestatus file.
*/
public static Set<String> getInvalidAndMergedIndexFiles(List<String> indexFiles)
throws IOException {
SegmentIndexFileStore indexFileStore = new SegmentIndexFileStore();
Set<String> mergedAndInvalidIndexFiles = new HashSet<>();
long lastModifiedTime = 0L;
- String validIndexFile = null;
+ String validMergeIndexFile = null;
List<String> mergeIndexFileNames = new ArrayList<>();
boolean isIndexFilesPresent = false;
for (String indexFile : indexFiles) {
if (indexFile.endsWith(CarbonTablePath.INDEX_FILE_EXT)) {
isIndexFilesPresent = true;
}
if (indexFile.endsWith(CarbonTablePath.MERGE_INDEX_FILE_EXT)) {
- // In case there are more than 1 mergeindex files present, latest one is considered as valid
- // Ex: In SI table, after small files index merge we will have more than 1 mergeindex files
- long timeStamp =
+ long indexFileTimeStamp =
Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(indexFile));
- if (timeStamp > lastModifiedTime) {
- lastModifiedTime = timeStamp;
- validIndexFile = indexFile;
+ // In case there are more than 1 mergeindex files present, latest one is considered as valid
+ if (indexFileTimeStamp > lastModifiedTime) {
+ lastModifiedTime = indexFileTimeStamp;
+ validMergeIndexFile = indexFile;
}
mergeIndexFileNames.add(indexFile);
}
}
- // get the invalid mergeindex files by excluding the valid file.
- if (mergeIndexFileNames.size() > 1 && validIndexFile != null) {
- final String validIndexFileName = validIndexFile;
+ // Possible scenarios where we have >1 merge index file:
+ // 1. SI load when MT has stale index files. As during SI load MT segment file is not updated,
+ // we directly read from segment directory.
+ // 2. In SI table, after small files index merge(refresh command) we will have more than one
+ // mergeindex file as we are not deleting old file immediately. If segment file gets deleted,
+ // then it reads from segment directory.
+ if (mergeIndexFileNames.size() > 1 && validMergeIndexFile != null) {
+ final String validIndexFileName = validMergeIndexFile;
+ // get the invalid mergeindex files by excluding the valid file.
mergedAndInvalidIndexFiles.addAll(
mergeIndexFileNames.stream().filter(file -> !file.equalsIgnoreCase(validIndexFileName))
.collect(Collectors.toSet()));
}
- if (isIndexFilesPresent && validIndexFile != null) {
- indexFileStore.readMergeFile(validIndexFile);
+ // If both index and merge index files are present, read the valid merge index file and add its
+ // mapped index files to the excluding list.
+ // Example: We have xxx.index and xxx.mergeindex files in a segment directory.
+ // Here the merged index file (xxx.index) is considered invalid.
Review comment:
i think this line is confusing, shouldn't it be like ` Here the index file (xxx.index) is considered invalid.` ?
##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -105,7 +105,8 @@ public SegmentFileStore(String tablePath, String segmentFileName) throws IOExcep
*/
public static void writeSegmentFile(String tablePath, final String taskNo, String location,
Review comment:
main method name also can be changed i guess, as internally it always called to write the segment file for partition table.
##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##########
@@ -184,7 +201,15 @@ public boolean accept(CarbonFile file) {
folderDetails.setStatus(SegmentStatus.SUCCESS.getMessage());
setIndexFileNamesToFolderDetails(folderDetails, carbonFiles);
segmentFile.addPath(location, folderDetails);
- String path = writePath + "/" + CarbonUtil.generateUUID() + CarbonTablePath.SEGMENT_EXT;
+ String path = null;
+ if (isMergeIndexFlow) {
Review comment:
i think its better to add comments in all the places where the merge index will be false saying that the code will be used for developer debugging purpose, so that it wont be confused with actual flow.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[hidden email]