Login  Register

[GitHub] [carbondata] kunal642 commented on a change in pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

Posted by GitBox on Nov 03, 2020; 1:44pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/GitHub-carbondata-ShreelekhyaG-opened-a-new-pull-request-3988-WIP-Clean-index-files-when-clean-filesd-tp102150p103059.html


kunal642 commented on a change in pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#discussion_r516152662



##########
File path: core/src/main/java/org/apache/carbondata/core/readcommitter/TableStatusReadCommittedScope.java
##########
@@ -79,13 +83,24 @@ public TableStatusReadCommittedScope(AbsoluteTableIdentifier identifier,
   @Override
   public Map<String, String> getCommittedIndexFile(Segment segment) throws IOException {
     Map<String, String> indexFiles;
-    if (segment.getSegmentFileName() == null) {
+    SegmentFileStore fileStore =

Review comment:
       based on the old check if segmentFileName can be null then this code can fail..Please check this

##########
File path: core/src/main/java/org/apache/carbondata/core/readcommitter/TableStatusReadCommittedScope.java
##########
@@ -79,13 +83,24 @@ public TableStatusReadCommittedScope(AbsoluteTableIdentifier identifier,
   @Override
   public Map<String, String> getCommittedIndexFile(Segment segment) throws IOException {
     Map<String, String> indexFiles;
-    if (segment.getSegmentFileName() == null) {
+    SegmentFileStore fileStore =
+        new SegmentFileStore(identifier.getTablePath(), segment.getSegmentFileName());
+    if (fileStore.getSegmentFile() == null) {
       String path =
           CarbonTablePath.getSegmentPath(identifier.getTablePath(), segment.getSegmentNo());
       indexFiles = new SegmentIndexFileStore().getMergeOrIndexFilesFromSegment(path);
+      List<String> indexFileList = new ArrayList<>(indexFiles.keySet());
+      Set<String> mergedIndexFiles =
+          SegmentFileStore.getInvalidAndMergedIndexFiles(indexFileList);
+      for (String indexFile : indexFileList) {
+        if (mergedIndexFiles
+            .contains(indexFile.substring(indexFile.lastIndexOf(File.separator) + 1))) {
+          // do not include already merged index files details.
+          indexFiles.remove(indexFile);
+        }
+      }
+      indexFileList = null;

Review comment:
       this is not needed

##########
File path: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java
##########
@@ -2558,8 +2558,8 @@ public static long getCarbonIndexSize(SegmentFileStore fileStore,
   // Get the total size of carbon data and the total size of carbon index
   public static HashMap<String, Long> getDataSizeAndIndexSize(String tablePath,
       Segment segment) throws IOException {
-    if (segment.getSegmentFileName() != null) {
-      SegmentFileStore fileStore = new SegmentFileStore(tablePath, segment.getSegmentFileName());
+    SegmentFileStore fileStore = new SegmentFileStore(tablePath, segment.getSegmentFileName());
+    if (fileStore.getSegmentFile() != null) {

Review comment:
       same as previous check..Please check if this will fail when segmmentFileName is null

##########
File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/SegmentIndexFileStore.java
##########
@@ -79,6 +79,11 @@
    */
   private Map<String, List<String>> carbonMergeFileToIndexFilesMap;
 
+  /**
+   * Stores the list of invalid index files of the SI segments in case of small files.
+   */
+  private static Set<String> oldSIIndexAndMergeIndexFiles = new HashSet<>();

Review comment:
       This would fail in case of concurrent queries as 1 operation may override for another

##########
File path: core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
##########
@@ -274,20 +274,22 @@ public static boolean updateTableMetadataStatus(Set<Segment> updatedSegmentsList
 
         LoadMetadataDetails[] listOfLoadFolderDetailsArray =
                 SegmentStatusManager.readLoadMetadata(metaDataFilepath);
-
+        Boolean isUpdateRequired = false;

Review comment:
       use boolean instead of Boolean




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]