Indhumathi27 opened a new pull request #3568: [WIP] Prune and Cache only Matched partitions for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
CarbonDataQA1 commented on issue #3568: [WIP] Prune and Cache only Matched partitions for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#issuecomment-572131387 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1537/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3568: [WIP] Prune and Cache only Matched partitions for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#issuecomment-572352006 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1550/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3568: [WIP] Prune and Cache only Matched partitions for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#issuecomment-572353287 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1552/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3568: [WIP] Prune and Cache only Matched partitions for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#issuecomment-572372948 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1553/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#issuecomment-572850496 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1576/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#discussion_r365521952 ########## File path: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMapFactory.java ########## @@ -81,19 +82,20 @@ public abstract DataMapBuilder createBuilder(Segment segment, String shardName, /** * Get the datamap for all segments */ - public Map<Segment, List<CoarseGrainDataMap>> getDataMaps(List<Segment> segments) - throws IOException { + public Map<Segment, List<CoarseGrainDataMap>> getDataMaps(List<Segment> segments, + List<PartitionSpec> partitions) throws IOException { Map<Segment, List<CoarseGrainDataMap>> dataMaps = new HashMap<>(); for (Segment segment : segments) { - dataMaps.put(segment, (List<CoarseGrainDataMap>) this.getDataMaps(segment)); + dataMaps.put(segment, (List<CoarseGrainDataMap>) this.getDataMaps(segment, partitions)); } return dataMaps; } /** * Get the datamap for segmentId */ - public abstract List<T> getDataMaps(Segment segment) throws IOException; + public abstract List<T> getDataMaps(Segment segment, List<PartitionSpec> partitions) Review comment: suggest to add one more interface instead of modifying existing one ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#discussion_r365522014 ########## File path: core/src/main/java/org/apache/carbondata/core/indexstore/SegmentPropertiesFetcher.java ########## @@ -34,7 +35,7 @@ * @return * @throws IOException */ - SegmentProperties getSegmentProperties(Segment segment) + SegmentProperties getSegmentProperties(Segment segment, List<PartitionSpec> partitionSpecs) Review comment: suggest to add one more interface instead of modifying existing one, keep both interface so that user can choose to use original one if no partition pruning is required. and please modify function description accordingly ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#discussion_r365522109 ########## File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ########## @@ -136,9 +136,22 @@ public DataMapBuilder createBuilder(Segment segment, String shardName, getTableBlockIndexUniqueIdentifiers(segment); for (TableBlockIndexUniqueIdentifier tableBlockIndexUniqueIdentifier : identifiers) { - tableBlockIndexUniqueIdentifierWrappers.add( - new TableBlockIndexUniqueIdentifierWrapper(tableBlockIndexUniqueIdentifier, - this.getCarbonTable())); + if (null != partitionsToPrune && !partitionsToPrune.isEmpty()) { + // add only tableBlockUniqueIdentifier that matches the partition + for (PartitionSpec partitionSpec : partitionsToPrune) { + if (partitionSpec.getLocation().toString() + .equalsIgnoreCase(tableBlockIndexUniqueIdentifier.getIndexFilePath())) { Review comment: It is not easy to understand why comparing partition spec location with index file path, can you make it more readable ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#discussion_r365522833 ########## File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ########## @@ -136,9 +136,22 @@ public DataMapBuilder createBuilder(Segment segment, String shardName, getTableBlockIndexUniqueIdentifiers(segment); for (TableBlockIndexUniqueIdentifier tableBlockIndexUniqueIdentifier : identifiers) { - tableBlockIndexUniqueIdentifierWrappers.add( - new TableBlockIndexUniqueIdentifierWrapper(tableBlockIndexUniqueIdentifier, - this.getCarbonTable())); + if (null != partitionsToPrune && !partitionsToPrune.isEmpty()) { + // add only tableBlockUniqueIdentifier that matches the partition + for (PartitionSpec partitionSpec : partitionsToPrune) { + if (partitionSpec.getLocation().toString() + .equalsIgnoreCase(tableBlockIndexUniqueIdentifier.getIndexFilePath())) { Review comment: Actually, `tableBlockIndexUniqueIdentifier.getIndexFilePath()` is the index file parent path. I will add comments for the same ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#issuecomment-573333504 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1605/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#discussion_r365667973 ########## File path: core/src/main/java/org/apache/carbondata/core/indexstore/SegmentPropertiesFetcher.java ########## @@ -34,7 +35,7 @@ * @return * @throws IOException */ - SegmentProperties getSegmentProperties(Segment segment) + SegmentProperties getSegmentProperties(Segment segment, List<PartitionSpec> partitionSpecs) Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#discussion_r365667992 ########## File path: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMapFactory.java ########## @@ -81,19 +82,20 @@ public abstract DataMapBuilder createBuilder(Segment segment, String shardName, /** * Get the datamap for all segments */ - public Map<Segment, List<CoarseGrainDataMap>> getDataMaps(List<Segment> segments) - throws IOException { + public Map<Segment, List<CoarseGrainDataMap>> getDataMaps(List<Segment> segments, + List<PartitionSpec> partitions) throws IOException { Map<Segment, List<CoarseGrainDataMap>> dataMaps = new HashMap<>(); for (Segment segment : segments) { - dataMaps.put(segment, (List<CoarseGrainDataMap>) this.getDataMaps(segment)); + dataMaps.put(segment, (List<CoarseGrainDataMap>) this.getDataMaps(segment, partitions)); } return dataMaps; } /** * Get the datamap for segmentId */ - public abstract List<T> getDataMaps(Segment segment) throws IOException; + public abstract List<T> getDataMaps(Segment segment, List<PartitionSpec> partitions) Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#discussion_r366136663 ########## File path: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ########## @@ -115,11 +115,20 @@ public CarbonTable getTable() { final List<PartitionSpec> partitions) throws IOException { final List<ExtendedBlocklet> blocklets = new ArrayList<>(); List<Segment> segments = getCarbonSegments(allsegments); - final Map<Segment, List<DataMap>> dataMaps = dataMapFactory.getDataMaps(segments); + final Map<Segment, List<DataMap>> dataMaps; + if (filter == null || filter.isEmpty() || partitions == null || partitions.isEmpty()) { + dataMaps = dataMapFactory.getDataMaps(segments); + } else { + dataMaps = dataMapFactory.getDataMaps(segments, partitions); + } // for non-filter queries // for filter queries int totalFiles = 0; int datamapsCount = 0; + // In case if filter has matched partitions, update the segment list + if (null != partitions && !partitions.isEmpty()) { + segments = new ArrayList<>(dataMaps.keySet()); + } Review comment: move line 117 in else block, because no need to call getCarbonSegments() in this scenario ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#discussion_r366142737 ########## File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ########## @@ -126,6 +126,14 @@ public DataMapBuilder createBuilder(Segment segment, String shardName, */ public Map<Segment, List<CoarseGrainDataMap>> getDataMaps(List<Segment> segments) throws IOException { + return getDataMaps(segments, null); + } + + /** + * Get the datamap for all segments + */ + public Map<Segment, List<CoarseGrainDataMap>> getDataMaps(List<Segment> segments, Review comment: Extract common code from this and getDataMaps with single segment API ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#discussion_r366187992 ########## File path: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ########## @@ -115,11 +115,20 @@ public CarbonTable getTable() { final List<PartitionSpec> partitions) throws IOException { final List<ExtendedBlocklet> blocklets = new ArrayList<>(); List<Segment> segments = getCarbonSegments(allsegments); - final Map<Segment, List<DataMap>> dataMaps = dataMapFactory.getDataMaps(segments); + final Map<Segment, List<DataMap>> dataMaps; + if (filter == null || filter.isEmpty() || partitions == null || partitions.isEmpty()) { + dataMaps = dataMapFactory.getDataMaps(segments); + } else { + dataMaps = dataMapFactory.getDataMaps(segments, partitions); + } // for non-filter queries // for filter queries int totalFiles = 0; int datamapsCount = 0; + // In case if filter has matched partitions, update the segment list + if (null != partitions && !partitions.isEmpty()) { + segments = new ArrayList<>(dataMaps.keySet()); + } Review comment: It is still required here. All segment files needs to be cached ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#discussion_r366188046 ########## File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ########## @@ -126,6 +126,14 @@ public DataMapBuilder createBuilder(Segment segment, String shardName, */ public Map<Segment, List<CoarseGrainDataMap>> getDataMaps(List<Segment> segments) throws IOException { + return getDataMaps(segments, null); + } + + /** + * Get the datamap for all segments + */ + public Map<Segment, List<CoarseGrainDataMap>> getDataMaps(List<Segment> segments, Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#issuecomment-574070861 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1631/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on issue #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#issuecomment-574192979 LGTM ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
asfgit closed pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
Free forum by Nabble | Edit this page |