[GitHub] [carbondata] Indhumathi27 opened a new pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

classic Classic list List threaded Threaded
209 messages Options
1 ... 891011
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
CarbonDataQA1 commented on issue #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-602183438
 
 
   Build Failed  with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/829/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on issue #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
Indhumathi27 commented on issue #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-602184282
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-602200827
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/830/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-602204009
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2537/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r396191624
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ##########
 @@ -140,9 +152,28 @@ public DataMapBuilder createBuilder(Segment segment, String shardName,
       segmentMap.put(segment.getSegmentNo(), segment);
       Set<TableBlockIndexUniqueIdentifier> identifiers =
           getTableBlockIndexUniqueIdentifiers(segment);
-      // get tableBlockIndexUniqueIdentifierWrappers from segment file info
-      getTableBlockUniqueIdentifierWrappers(partitionsToPrune,
-          tableBlockIndexUniqueIdentifierWrappers, identifiers);
+      if (null != partitionsToPrune) {
+        // get tableBlockIndexUniqueIdentifierWrappers from segment file info
+        getTableBlockUniqueIdentifierWrappers(partitionsToPrune,
+            tableBlockIndexUniqueIdentifierWrappers, identifiers);
+      } else {
+        SegmentMetaDataInfo segmentMetaDataInfo = segment.getSegmentMetaDataInfo();
 
 Review comment:
   tableBlockIndexUniqueIdentifierWrappers is just a wrapper around TableBlockIndexUniqueIdentifier.
   So,we should avoid creating TableBlockIndexUniqueIdentifier itself If that filter doesn't match segment min max. [line 153, should not be called if isScanRequired is false]
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r396223698
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ##########
 @@ -140,9 +152,28 @@ public DataMapBuilder createBuilder(Segment segment, String shardName,
       segmentMap.put(segment.getSegmentNo(), segment);
       Set<TableBlockIndexUniqueIdentifier> identifiers =
           getTableBlockIndexUniqueIdentifiers(segment);
-      // get tableBlockIndexUniqueIdentifierWrappers from segment file info
-      getTableBlockUniqueIdentifierWrappers(partitionsToPrune,
-          tableBlockIndexUniqueIdentifierWrappers, identifiers);
+      if (null != partitionsToPrune) {
+        // get tableBlockIndexUniqueIdentifierWrappers from segment file info
+        getTableBlockUniqueIdentifierWrappers(partitionsToPrune,
+            tableBlockIndexUniqueIdentifierWrappers, identifiers);
+      } else {
+        SegmentMetaDataInfo segmentMetaDataInfo = segment.getSegmentMetaDataInfo();
 
 Review comment:
   If we keep SegmentFileStore inside Segment class and read the segment before calling readCommittedScope.committedIndexFiles.  we can avoid creating the TableBlockIndexUniqueIdentifier object. May be more changes now. Can do in other PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on issue #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
ajantha-bhat commented on issue #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-602398097
 
 
   LGTM

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r396224060
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ##########
 @@ -140,9 +152,28 @@ public DataMapBuilder createBuilder(Segment segment, String shardName,
       segmentMap.put(segment.getSegmentNo(), segment);
       Set<TableBlockIndexUniqueIdentifier> identifiers =
           getTableBlockIndexUniqueIdentifiers(segment);
-      // get tableBlockIndexUniqueIdentifierWrappers from segment file info
-      getTableBlockUniqueIdentifierWrappers(partitionsToPrune,
-          tableBlockIndexUniqueIdentifierWrappers, identifiers);
+      if (null != partitionsToPrune) {
+        // get tableBlockIndexUniqueIdentifierWrappers from segment file info
+        getTableBlockUniqueIdentifierWrappers(partitionsToPrune,
+            tableBlockIndexUniqueIdentifierWrappers, identifiers);
+      } else {
+        SegmentMetaDataInfo segmentMetaDataInfo = segment.getSegmentMetaDataInfo();
 
 Review comment:
   ok. Will take up in future PR

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
asfgit closed pull request #3584: [CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
1 ... 891011