[GitHub] [carbondata] Zhangshunyu opened a new pull request #3833: [CARBONDATA-3894] [IUD]decrease the size of tableupdatestaus file by remove the invalid segments not exist in tablestatus

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Zhangshunyu opened a new pull request #3833: [CARBONDATA-3894] [IUD]decrease the size of tableupdatestaus file by remove the invalid segments not exist in tablestatus

GitBox

Zhangshunyu opened a new pull request #3833:
URL: https://github.com/apache/carbondata/pull/3833


    ### Why is this PR needed?
    tableupdatestatus file always keep the segments info even the compacted segment is deleted already,this will lead to the file size increase quickly, which is bad for performance.
   
   
    ### What changes were proposed in this PR?
   Remove the invalid segments
       
    ### Does this PR introduce any user interface change?
    - No
   
   
    ### Is any new testcase added?
    - No
   
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3833: [CARBONDATA-3894] [IUD]decrease the size of tableupdatestaus file by remove the invalid segments not exist in tablestatus

GitBox

CarbonDataQA1 commented on pull request #3833:
URL: https://github.com/apache/carbondata/pull/3833#issuecomment-655913486


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1594/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3833: [CARBONDATA-3894] [IUD]decrease the size of tableupdatestaus file by remove the invalid segments not exist in tablestatus

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3833:
URL: https://github.com/apache/carbondata/pull/3833#issuecomment-655914263


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3334/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3833: [CARBONDATA-3894] [IUD]decrease the size of tableupdatestaus file by remove the invalid segments not exist in tablestatus

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3833:
URL: https://github.com/apache/carbondata/pull/3833#discussion_r452647862



##########
File path: core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
##########
@@ -148,7 +148,20 @@ public static boolean updateSegmentStatus(List<SegmentUpdateDetails> updateDetai
           mergeSegmentUpdate(isCompaction, oldList, newBlockEntry);
         }
 
-        segmentUpdateStatusManager.writeLoadDetailsIntoFile(oldList, updateStatusFileIdentifier);
+        List<SegmentUpdateDetails> updateDetailsValidSeg = new ArrayList<>();
+        for (SegmentUpdateDetails updateDetails : oldList) {
+          for (LoadMetadataDetails details : segmentUpdateStatusManager.getLoadMetadataDetails()) {
+            if (updateDetails.getSegmentName().equalsIgnoreCase(details.getLoadName())) {
+              // we should only keep the update info of segments in table status, especially after
+              // compaction and clean files some compacted segments will removed. It can keep

Review comment:
       ```suggestion
                 // compaction and clean files some compacted segments will be removed. It can keep
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on a change in pull request #3833: [CARBONDATA-3894] [IUD]decrease the size of tableupdatestaus file by remove the invalid segments not exist in tablestatus

GitBox
In reply to this post by GitBox

marchpure commented on a change in pull request #3833:
URL: https://github.com/apache/carbondata/pull/3833#discussion_r452657326



##########
File path: core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
##########
@@ -148,7 +148,20 @@ public static boolean updateSegmentStatus(List<SegmentUpdateDetails> updateDetai
           mergeSegmentUpdate(isCompaction, oldList, newBlockEntry);
         }
 
-        segmentUpdateStatusManager.writeLoadDetailsIntoFile(oldList, updateStatusFileIdentifier);
+        List<SegmentUpdateDetails> updateDetailsValidSeg = new ArrayList<>();

Review comment:
       Map<String, LoadMetadataDetails>  details <- segmentUpdateStatusManager.getLoadMetadataDetails()
   
   for (SegmentUpdateDetails updateDetail : oldList) {
        if(details.contains(updateDetail) {
        }
   }
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Zhangshunyu commented on a change in pull request #3833: [CARBONDATA-3894] [IUD]decrease the size of tableupdatestaus file by remove the invalid segments not exist in tablestatus

GitBox
In reply to this post by GitBox

Zhangshunyu commented on a change in pull request #3833:
URL: https://github.com/apache/carbondata/pull/3833#discussion_r452657347



##########
File path: core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
##########
@@ -148,7 +148,20 @@ public static boolean updateSegmentStatus(List<SegmentUpdateDetails> updateDetai
           mergeSegmentUpdate(isCompaction, oldList, newBlockEntry);
         }
 
-        segmentUpdateStatusManager.writeLoadDetailsIntoFile(oldList, updateStatusFileIdentifier);
+        List<SegmentUpdateDetails> updateDetailsValidSeg = new ArrayList<>();
+        for (SegmentUpdateDetails updateDetails : oldList) {
+          for (LoadMetadataDetails details : segmentUpdateStatusManager.getLoadMetadataDetails()) {
+            if (updateDetails.getSegmentName().equalsIgnoreCase(details.getLoadName())) {
+              // we should only keep the update info of segments in table status, especially after
+              // compaction and clean files some compacted segments will removed. It can keep

Review comment:
       @Indhumathi27 OK




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on a change in pull request #3833: [CARBONDATA-3894] [IUD]decrease the size of tableupdatestaus file by remove the invalid segments not exist in tablestatus

GitBox
In reply to this post by GitBox

marchpure commented on a change in pull request #3833:
URL: https://github.com/apache/carbondata/pull/3833#discussion_r452657326



##########
File path: core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
##########
@@ -148,7 +148,20 @@ public static boolean updateSegmentStatus(List<SegmentUpdateDetails> updateDetai
           mergeSegmentUpdate(isCompaction, oldList, newBlockEntry);
         }
 
-        segmentUpdateStatusManager.writeLoadDetailsIntoFile(oldList, updateStatusFileIdentifier);
+        List<SegmentUpdateDetails> updateDetailsValidSeg = new ArrayList<>();

Review comment:
       Map<String, LoadMetadataDetails>  details <- segmentUpdateStatusManager.getLoadMetadataDetails()
   
   for (SegmentUpdateDetails updateDetail : oldList) {
        if(details.contains(updateDetails.getSegmentName()) {
        }
   }
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Zhangshunyu commented on a change in pull request #3833: [CARBONDATA-3894] [IUD]decrease the size of tableupdatestaus file by remove the invalid segments not exist in tablestatus

GitBox
In reply to this post by GitBox

Zhangshunyu commented on a change in pull request #3833:
URL: https://github.com/apache/carbondata/pull/3833#discussion_r452663050



##########
File path: core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
##########
@@ -148,7 +148,20 @@ public static boolean updateSegmentStatus(List<SegmentUpdateDetails> updateDetai
           mergeSegmentUpdate(isCompaction, oldList, newBlockEntry);
         }
 
-        segmentUpdateStatusManager.writeLoadDetailsIntoFile(oldList, updateStatusFileIdentifier);
+        List<SegmentUpdateDetails> updateDetailsValidSeg = new ArrayList<>();

Review comment:
       @marchpure OK, use a hashset to check




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3833: [CARBONDATA-3894] [IUD]decrease the size of tableupdatestaus file by remove the invalid segments not exist in tablestatus

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3833:
URL: https://github.com/apache/carbondata/pull/3833#issuecomment-656581036


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1607/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3833: [CARBONDATA-3894] [IUD]decrease the size of tableupdatestaus file by remove the invalid segments not exist in tablestatus

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3833:
URL: https://github.com/apache/carbondata/pull/3833#issuecomment-656582778


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3347/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on pull request #3833: [CARBONDATA-3894] [IUD]decrease the size of tableupdatestaus file by remove the invalid segments not exist in tablestatus

GitBox
In reply to this post by GitBox

marchpure commented on pull request #3833:
URL: https://github.com/apache/carbondata/pull/3833#issuecomment-656653483


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #3833: [CARBONDATA-3894] [IUD]decrease the size of tableupdatestaus file by remove the invalid segments not exist in tablestatus

GitBox
In reply to this post by GitBox

asfgit closed pull request #3833:
URL: https://github.com/apache/carbondata/pull/3833


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]