[GitHub] [carbondata] shenjiayu17 opened a new pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

classic Classic list List threaded Threaded
103 messages Options
123456
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

GitBox

CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-711838443


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4507/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r507834166



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,66 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files mapped to each block of the specified segment.
+   * First list all deletedelta files in the segment dir, then loop the files and find
+   * a map of blocks and .deletedelta files related to each block.
+   *
+   * @param seg the segment which is to find blocks
+   * @return a map of block and its file list
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
-    String segmentPath = CarbonTablePath.getSegmentPath(
-        identifier.getTablePath(), segmentId.getSegmentNo());
-    CarbonFile segDir =
-        FileFactory.getCarbonFile(segmentPath);
+  public Map<String, List<CarbonFile>> getDeleteDeltaFilesList(final Segment seg) {
+
+    Map<String, long[]> blockDeltaStartAndEndTimestampMap = new HashMap<>();

Review comment:
       @shenjiayu17 why exactly do we need to change here? Introduce map and all, still we are doing the list files which is costly operation. I have some points, please check
   
   1. Here no need to create these map, again listing files to fill map.
   we are already getting the blockname and every blovck will have one corresponding deletedelta file only right. So always the delta files per block block will be one.
   2. The updatedetails contains the blockname, actual blockname, timestamps and delete delta timestamps so you can check if the timestamp not empty, you can yourself form the delta file name based on these info and return from this method.
   
   With the above approach, you will avoid list files operation, filtering operation based on timestamp and creating all these maps. So you can avoid all these changes.
   
   Always we keep the horizontal compaction threshold as 1, and we dont change and dont recommend users to change to get the better performance.

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -173,6 +176,9 @@ object HorizontalCompaction {
 
     val db = carbonTable.getDatabaseName
     val table = carbonTable.getTableName
+
+    LOG.info(s"Horizontal Delete Compaction operation is getting valid segments for [$db.$table].")

Review comment:
       same as above

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -125,6 +125,9 @@ object HorizontalCompaction {
       segLists: util.List[Segment]): Unit = {
     val db = carbonTable.getDatabaseName
     val table = carbonTable.getTableName
+
+    LOG.info(s"Horizontal Update Compaction operation is getting valid segments for [$db.$table].")

Review comment:
       i think this log does not give any useful info here, if you put some log after line 133 and print `validSegList ` it looks little useful




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-712840146


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4539/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-712853350


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2785/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ydvpankaj99 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

GitBox
In reply to this post by GitBox

ydvpankaj99 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-713228255


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r508933445



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -125,6 +125,9 @@ object HorizontalCompaction {
       segLists: util.List[Segment]): Unit = {
     val db = carbonTable.getDatabaseName
     val table = carbonTable.getTableName
+
+    LOG.info(s"Horizontal Update Compaction operation is getting valid segments for [$db.$table].")

Review comment:
       This log is modified, prints the validSegList and time taken to get it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r508933755



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -173,6 +176,9 @@ object HorizontalCompaction {
 
     val db = carbonTable.getDatabaseName
     val table = carbonTable.getTableName
+
+    LOG.info(s"Horizontal Delete Compaction operation is getting valid segments for [$db.$table].")

Review comment:
       This log is modified, prints the deletedBlocksList and time taken to get it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r508943092



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,66 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files mapped to each block of the specified segment.
+   * First list all deletedelta files in the segment dir, then loop the files and find
+   * a map of blocks and .deletedelta files related to each block.
+   *
+   * @param seg the segment which is to find blocks
+   * @return a map of block and its file list
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
-    String segmentPath = CarbonTablePath.getSegmentPath(
-        identifier.getTablePath(), segmentId.getSegmentNo());
-    CarbonFile segDir =
-        FileFactory.getCarbonFile(segmentPath);
+  public Map<String, List<CarbonFile>> getDeleteDeltaFilesList(final Segment seg) {
+
+    Map<String, long[]> blockDeltaStartAndEndTimestampMap = new HashMap<>();

Review comment:
       I modified this method as above approach.
   The delete delta file name is generated from updatedetails, without listing files




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-713264301


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4555/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-713265280


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2803/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509190649



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -125,12 +125,18 @@ object HorizontalCompaction {
       segLists: util.List[Segment]): Unit = {
     val db = carbonTable.getDatabaseName
     val table = carbonTable.getTableName
+    val startTime = System.nanoTime()
+
     // get the valid segments qualified for update compaction.
     val validSegList = CarbonDataMergerUtil.getSegListIUDCompactionQualified(segLists,
       absTableIdentifier,
       segmentUpdateStatusManager,
       compactionTypeIUD)
 
+    val endTime = System.nanoTime()
+    LOG.info(s"time taken to get segment list for Horizontal Update Compaction is" +

Review comment:
       I dont think time in this log will help us, just segment list can help in someway, not time




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509190746



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##########
@@ -173,11 +179,17 @@ object HorizontalCompaction {
 
     val db = carbonTable.getDatabaseName
     val table = carbonTable.getTableName
+    val startTime = System.nanoTime()
+
     val deletedBlocksList = CarbonDataMergerUtil.getSegListIUDCompactionQualified(segLists,
       absTableIdentifier,
       segmentUpdateStatusManager,
       compactionTypeIUD)
 
+    val endTime = System.nanoTime()
+    LOG.info(s"time taken to get deleted block list for Horizontal Delete Compaction is" +

Review comment:
       same as above




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509202313



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,38 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
+  public List<String> getDeleteDeltaFilesList(final Segment seg, final String blockName) {
+
+    List<String> deleteDeltaFileList = new ArrayList<>();
     String segmentPath = CarbonTablePath.getSegmentPath(
-        identifier.getTablePath(), segmentId.getSegmentNo());
-    CarbonFile segDir =
-        FileFactory.getCarbonFile(segmentPath);
+        identifier.getTablePath(), seg.getSegmentNo());
+
     for (SegmentUpdateDetails block : updateDetails) {
       if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-          (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-          && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus()))) {
+          (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+          !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
         final long deltaStartTimestamp =
             getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
         final long deltaEndTimeStamp =
             getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-
-        return segDir.listFiles(new CarbonFileFilter() {
-
-          @Override
-          public boolean accept(CarbonFile pathName) {
-            String fileName = pathName.getName();
-            if (pathName.getSize() > 0
-                && fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-              String blkName = fileName.substring(0, fileName.lastIndexOf("-"));
-              long timestamp =
-                  Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-              return blockName.equals(blkName) && timestamp <= deltaEndTimeStamp
-                  && timestamp >= deltaStartTimestamp;
-            }
-            return false;
-          }
-        });
+        Set<String> deleteDeltaFiles = new HashSet<>();
+        deleteDeltaFiles.add(segmentPath + CarbonCommonConstants.FILE_SEPARATOR +
+            blockName + CarbonCommonConstants.HYPHEN + deltaStartTimestamp +
+            CarbonCommonConstants.DELETE_DELTA_FILE_EXT);
+        deleteDeltaFiles.add(segmentPath + CarbonCommonConstants.FILE_SEPARATOR +
+            blockName + CarbonCommonConstants.HYPHEN + deltaEndTimeStamp +
+            CarbonCommonConstants.DELETE_DELTA_FILE_EXT);

Review comment:
       why are you adding two times here with start and end timestamp? This is wrong, please check




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509204221



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,38 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
+  public List<String> getDeleteDeltaFilesList(final Segment seg, final String blockName) {
+
+    List<String> deleteDeltaFileList = new ArrayList<>();
     String segmentPath = CarbonTablePath.getSegmentPath(
-        identifier.getTablePath(), segmentId.getSegmentNo());
-    CarbonFile segDir =
-        FileFactory.getCarbonFile(segmentPath);
+        identifier.getTablePath(), seg.getSegmentNo());
+
     for (SegmentUpdateDetails block : updateDetails) {
       if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-          (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-          && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus()))) {
+          (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+          !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
         final long deltaStartTimestamp =
             getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
         final long deltaEndTimeStamp =
             getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-
-        return segDir.listFiles(new CarbonFileFilter() {
-
-          @Override
-          public boolean accept(CarbonFile pathName) {
-            String fileName = pathName.getName();
-            if (pathName.getSize() > 0
-                && fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-              String blkName = fileName.substring(0, fileName.lastIndexOf("-"));
-              long timestamp =
-                  Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-              return blockName.equals(blkName) && timestamp <= deltaEndTimeStamp
-                  && timestamp >= deltaStartTimestamp;
-            }
-            return false;
-          }
-        });
+        Set<String> deleteDeltaFiles = new HashSet<>();

Review comment:
       instead of this you can follow like below
   1. from `SegmentUpdateDetails`, call `getDeltaFileStamps` method and check if not null and size > 0, if not null, directly take the list of delta timestamps and prepare to final list with the blockname, as we already know
   2. If the `getDeltaFileStamps` gives null, then there is only one valid delta timestamp, in this case, `SegmentUpdateDetails` will have same delta start and end timestamp, so you can take one and form a delta file name and return only this, as only one file is there. No need to create list just to add one, so u can create list only in 1st case.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509221753



##########
File path: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##########
@@ -1039,22 +1039,24 @@ private static boolean isSegmentValid(LoadMetadataDetails seg) {
     if (CompactionType.IUD_DELETE_DELTA == compactionTypeIUD) {
       int numberDeleteDeltaFilesThreshold =
           CarbonProperties.getInstance().getNoDeleteDeltaFilesThresholdForIUDCompaction();
-      List<Segment> deleteSegments = new ArrayList<>();
-      for (Segment seg : segments) {
-        if (checkDeleteDeltaFilesInSeg(seg, segmentUpdateStatusManager,
-            numberDeleteDeltaFilesThreshold)) {
-          deleteSegments.add(seg);
+
+      // firstly find the valid segments which are updated from SegmentUpdateDetails,
+      // in order to reduce the segment list for behind traversal
+      List<String> segmentsPresentInSegmentUpdateDetails = new ArrayList<>();

Review comment:
       you dont need to delete the old code, its already going inside logic if the `updateDetails `are present, So you can just put the `getDeleteDeltaFilesInSeg` method inside `checkDeleteDeltaFilesInSeg` and no need to completely modify `getSegListIUDCompactionQualified`, it will be more clean.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

GitBox
In reply to this post by GitBox

akashrn5 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-713522116


   > Have you ever tested this optimization? Could you pls give a comparison result for this change?
   as @Zhangshunyu said, its better if you can get the reading and update in PR, it would be great. Thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 edited a comment on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

GitBox
In reply to this post by GitBox

akashrn5 edited a comment on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-713522116


   > Have you ever tested this optimization? Could you pls give a comparison result for this change?
   
   as @Zhangshunyu said, its better if you can get the reading and update in PR, it would be great. Thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-714213749






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509920810



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,39 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
+  public List<String> getDeleteDeltaFilesList(final Segment seg, final String blockName) {
+
+    List<String> deleteDeltaFileList = new ArrayList<>();
     String segmentPath = CarbonTablePath.getSegmentPath(
-        identifier.getTablePath(), segmentId.getSegmentNo());
-    CarbonFile segDir =
-        FileFactory.getCarbonFile(segmentPath);
+        identifier.getTablePath(), seg.getSegmentNo());
+
     for (SegmentUpdateDetails block : updateDetails) {
       if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-          (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-          && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus()))) {
-        final long deltaStartTimestamp =
-            getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-        final long deltaEndTimeStamp =
-            getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-
-        return segDir.listFiles(new CarbonFileFilter() {
-
-          @Override
-          public boolean accept(CarbonFile pathName) {
-            String fileName = pathName.getName();
-            if (pathName.getSize() > 0
-                && fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-              String blkName = fileName.substring(0, fileName.lastIndexOf("-"));
-              long timestamp =
-                  Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-              return blockName.equals(blkName) && timestamp <= deltaEndTimeStamp
-                  && timestamp >= deltaStartTimestamp;
-            }
-            return false;
-          }
-        });
+          (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+          !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
+        Set<String> deltaFileTimestamps = block.getDeltaFileStamps();
+        String deleteDeltaFilePrefix = segmentPath + CarbonCommonConstants.FILE_SEPARATOR +
+            blockName + CarbonCommonConstants.HYPHEN;
+        if (deltaFileTimestamps != null && deltaFileTimestamps.size() > 0) {
+          deltaFileTimestamps.forEach(timestamp -> deleteDeltaFileList.add(
+              deleteDeltaFilePrefix + timestamp + CarbonCommonConstants.DELETE_DELTA_FILE_EXT));
+        } else {
+          final long deltaEndTimeStamp =

Review comment:
       add a comment here saying, when the deltaFileTimestamps  is null, then there is only one delta file and update details will have same start and end timestamps




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509920920



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##########
@@ -415,44 +415,39 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
    */
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final String blockName) {
+  public List<String> getDeleteDeltaFilesList(final Segment seg, final String blockName) {
+

Review comment:
       remove empty lines




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


123456