[GitHub] [carbondata] dhatchayani commented on a change in pull request #3148: [CARBONDATA-3293] Prune datamaps improvement for count(*)

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] dhatchayani commented on a change in pull request #3148: [CARBONDATA-3293] Prune datamaps improvement for count(*)

GitBox
dhatchayani commented on a change in pull request #3148: [CARBONDATA-3293] Prune datamaps improvement for count(*)
URL: https://github.com/apache/carbondata/pull/3148#discussion_r266315551
 
 

 ##########
 File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
 ##########
 @@ -624,36 +624,41 @@ public BlockMappingVO getBlockRowCount(Job job, CarbonTable table,
           .clearInvalidSegments(getOrCreateCarbonTable(job.getConfiguration()),
               toBeCleanedSegments);
     }
-    List<ExtendedBlocklet> blocklets =
-        blockletMap.prune(filteredSegment, (FilterResolverIntf) null, partitions);
-    for (ExtendedBlocklet blocklet : blocklets) {
-      String blockName = blocklet.getPath();
-      blockName = CarbonTablePath.getCarbonDataFileName(blockName);
-      blockName = blockName + CarbonTablePath.getCarbonDataExtension();
-
-      long rowCount = blocklet.getDetailInfo().getRowCount();
-
-      String segmentId = Segment.toSegment(blocklet.getSegmentId()).getSegmentNo();
-      String key = CarbonUpdateUtil.getSegmentBlockNameKey(segmentId, blockName);
-
-      // if block is invalid then don't add the count
-      SegmentUpdateDetails details = updateStatusManager.getDetailsForABlock(key);
-
-      if (null == details || !CarbonUpdateUtil.isBlockInvalid(details.getSegmentStatus())) {
-        Long blockCount = blockRowCountMapping.get(key);
-        if (blockCount == null) {
-          blockCount = 0L;
-          Long count = segmentAndBlockCountMapping.get(segmentId);
-          if (count == null) {
-            count = 0L;
+    Map<String, Long> blockletToRowCountMap =
+        defaultDataMap.getBlockRowCount(filteredSegment, partitions, defaultDataMap, isUpdateFlow);
+    if (isIUDTable || isUpdateFlow) {
+      // key is the (segmentId","+blockletPath) and key is the row count of that blocklet
+      for (Map.Entry<String, Long> eachBlocklet : blockletToRowCountMap.entrySet()) {
+        String[] segmentIdAndPath = eachBlocklet.getKey().split(",", 2);
+        String segmentId = segmentIdAndPath[0];
+        String blockName = segmentIdAndPath[1];
+        blockName = CarbonTablePath.getCarbonDataFileName(blockName);
+        blockName = blockName + CarbonTablePath.getCarbonDataExtension();
+
+        long rowCount = eachBlocklet.getValue();
+
+        String key = CarbonUpdateUtil.getSegmentBlockNameKey(segmentId, blockName);
+
+        // if block is invalid then don't add the count
+        SegmentUpdateDetails details = updateStatusManager.getDetailsForABlock(key);
+
+        if (null == details || !CarbonUpdateUtil.isBlockInvalid(details.getSegmentStatus())) {
+          Long blockCount = blockRowCountMapping.get(key);
+          if (blockCount == null) {
+            blockCount = 0L;
+            Long count = segmentAndBlockCountMapping.get(segmentId);
+            if (count == null) {
+              count = 0L;
+            }
+            segmentAndBlockCountMapping.put(segmentId, count + 1);
           }
-          segmentAndBlockCountMapping.put(segmentId, count + 1);
+          blockCount += rowCount;
+          blockRowCountMapping.put(key, blockCount);
 
 Review comment:
   yes. we need to put every time for every block. In case of update/delete, block level row count is needed. And blockCount will not be null. it can only be 0.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services