VenuReddy2103 opened a new pull request #3772: URL: https://github.com/apache/carbondata/pull/3772 ### Why is this PR needed? At present, carbon doesn't do block/blocklet pruning for polygon fileter queries. It does rowlevel filtering at carbon layer and returns result. With this approach, all the carbon files are scanned irrespective of the where there are any matching rows in the block. It also has spark overhead to launch many jobs and tasks to process them. Thus affects the overall performance of polygon query. ### What changes were proposed in this PR? Leverage the existing block pruning mechanism in the carbon and avoided the unwanted blocks with block pruning. Thus reduce the number of splits. And at the executor side, used blocklet pruning and reduced the number of blocklets to be read and scanned. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA1 commented on pull request #3772: URL: https://github.com/apache/carbondata/pull/3772#issuecomment-632497488 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1331/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3772: URL: https://github.com/apache/carbondata/pull/3772#issuecomment-632498594 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3051/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 commented on pull request #3772: URL: https://github.com/apache/carbondata/pull/3772#issuecomment-632507623 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3772: URL: https://github.com/apache/carbondata/pull/3772#issuecomment-632513834 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1332/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3772: URL: https://github.com/apache/carbondata/pull/3772#issuecomment-632515075 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3052/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 commented on pull request #3772: URL: https://github.com/apache/carbondata/pull/3772#issuecomment-633812312 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3772: URL: https://github.com/apache/carbondata/pull/3772#issuecomment-633862686 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
QiangCai commented on a change in pull request #3772: URL: https://github.com/apache/carbondata/pull/3772#discussion_r437834303 ########## File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonExpression.java ########## @@ -46,15 +51,16 @@ private CustomIndex<List<Long[]>> instance; private List<Long[]> ranges = new ArrayList<Long[]>(); private ColumnExpression column; - private ExpressionResult trueExpRes; - private ExpressionResult falseExpRes; + private static final ExpressionResult trueExpRes = Review comment: better to remove UnknownExpression from PolygonExpression's supper classes ########## File path: core/src/main/java/org/apache/carbondata/core/scan/filter/FilterUtil.java ########## @@ -188,6 +189,14 @@ private static FilterExecuter createFilterExecuterTree( return new FalseFilterExecutor(); case ROWLEVEL: default: + if (filterExpressionResolverTree.getFilterExpression() instanceof UnknownExpression) { Review comment: can we add new expression type and filter executor type? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
kunal642 commented on a change in pull request #3772: URL: https://github.com/apache/carbondata/pull/3772#discussion_r438818433 ########## File path: geo/src/main/java/org/apache/carbondata/geo/scan/filter/executor/PolygonFilterExecutorImpl.java ########## @@ -0,0 +1,84 @@ +package org.apache.carbondata.geo.scan.filter.executor; Review comment: Add header ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
kunal642 commented on a change in pull request #3772: URL: https://github.com/apache/carbondata/pull/3772#discussion_r438820691 ########## File path: geo/src/main/java/org/apache/carbondata/geo/scan/filter/executor/PolygonFilterExecutorImpl.java ########## @@ -0,0 +1,84 @@ +package org.apache.carbondata.geo.scan.filter.executor; + +import java.util.BitSet; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk; +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.filter.GenericQueryType; +import org.apache.carbondata.core.scan.filter.executer.RowLevelFilterExecuterImpl; +import org.apache.carbondata.core.scan.filter.resolver.resolverinfo.DimColumnResolvedFilterInfo; +import org.apache.carbondata.core.scan.filter.resolver.resolverinfo.MeasureColumnResolvedFilterInfo; +import org.apache.carbondata.core.scan.processor.RawBlockletColumnChunks; +import org.apache.carbondata.core.util.DataTypeUtil; +import org.apache.carbondata.geo.scan.expression.PolygonExpression; + +import org.apache.log4j.Logger; + +public class PolygonFilterExecutorImpl extends RowLevelFilterExecuterImpl { + public PolygonFilterExecutorImpl(List<DimColumnResolvedFilterInfo> dimColEvaluatorInfoList, Review comment: Format this class ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
kunal642 commented on a change in pull request #3772: URL: https://github.com/apache/carbondata/pull/3772#discussion_r438837210 ########## File path: geo/src/main/java/org/apache/carbondata/geo/scan/filter/executor/PolygonFilterExecutorImpl.java ########## @@ -0,0 +1,84 @@ +package org.apache.carbondata.geo.scan.filter.executor; + +import java.util.BitSet; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk; +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.filter.GenericQueryType; +import org.apache.carbondata.core.scan.filter.executer.RowLevelFilterExecuterImpl; +import org.apache.carbondata.core.scan.filter.resolver.resolverinfo.DimColumnResolvedFilterInfo; +import org.apache.carbondata.core.scan.filter.resolver.resolverinfo.MeasureColumnResolvedFilterInfo; +import org.apache.carbondata.core.scan.processor.RawBlockletColumnChunks; +import org.apache.carbondata.core.util.DataTypeUtil; +import org.apache.carbondata.geo.scan.expression.PolygonExpression; + +import org.apache.log4j.Logger; + +public class PolygonFilterExecutorImpl extends RowLevelFilterExecuterImpl { + public PolygonFilterExecutorImpl(List<DimColumnResolvedFilterInfo> dimColEvaluatorInfoList, + List<MeasureColumnResolvedFilterInfo> msrColEvalutorInfoList, Expression exp, + AbsoluteTableIdentifier tableIdentifier, SegmentProperties segmentProperties, + Map<Integer, GenericQueryType> complexDimensionInfoMap) { + super(dimColEvaluatorInfoList, msrColEvalutorInfoList, exp, tableIdentifier, segmentProperties, + complexDimensionInfoMap); + } + + private int getNearestRangeIndex(List<Long[]> ranges, long searchForNumber) { + Long[] range; + int low = 0, mid = 0, high = ranges.size() - 1; + while (low <= high) { + mid = low + ((high - low) / 2); + range = ranges.get(mid); + if (searchForNumber >= range[0]) { + if (searchForNumber <= range[1]) { + // Return the range index if the number is between min and max values of the range + return mid; + } else { + // Number is bigger than this range's min and max. Search on the right side of the range + low = mid + 1; + } + } else { + // Number is smaller than this range's min and max. Search on the left side of the range + high = mid - 1; + } + } + return mid; + } + + private boolean isScanRequired(byte[] maxValue, byte[] minValue) { Review comment: Please write a detailed explanation for the logic ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3772: URL: https://github.com/apache/carbondata/pull/3772#discussion_r440377390 ########## File path: geo/src/main/java/org/apache/carbondata/geo/scan/filter/executor/PolygonFilterExecutorImpl.java ########## @@ -0,0 +1,84 @@ +package org.apache.carbondata.geo.scan.filter.executor; Review comment: Added ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3772: URL: https://github.com/apache/carbondata/pull/3772#discussion_r440377512 ########## File path: geo/src/main/java/org/apache/carbondata/geo/scan/filter/executor/PolygonFilterExecutorImpl.java ########## @@ -0,0 +1,84 @@ +package org.apache.carbondata.geo.scan.filter.executor; + +import java.util.BitSet; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk; +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.filter.GenericQueryType; +import org.apache.carbondata.core.scan.filter.executer.RowLevelFilterExecuterImpl; +import org.apache.carbondata.core.scan.filter.resolver.resolverinfo.DimColumnResolvedFilterInfo; +import org.apache.carbondata.core.scan.filter.resolver.resolverinfo.MeasureColumnResolvedFilterInfo; +import org.apache.carbondata.core.scan.processor.RawBlockletColumnChunks; +import org.apache.carbondata.core.util.DataTypeUtil; +import org.apache.carbondata.geo.scan.expression.PolygonExpression; + +import org.apache.log4j.Logger; + +public class PolygonFilterExecutorImpl extends RowLevelFilterExecuterImpl { + public PolygonFilterExecutorImpl(List<DimColumnResolvedFilterInfo> dimColEvaluatorInfoList, Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3772: URL: https://github.com/apache/carbondata/pull/3772#discussion_r440378220 ########## File path: geo/src/main/java/org/apache/carbondata/geo/scan/filter/executor/PolygonFilterExecutorImpl.java ########## @@ -0,0 +1,84 @@ +package org.apache.carbondata.geo.scan.filter.executor; + +import java.util.BitSet; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk; +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.filter.GenericQueryType; +import org.apache.carbondata.core.scan.filter.executer.RowLevelFilterExecuterImpl; +import org.apache.carbondata.core.scan.filter.resolver.resolverinfo.DimColumnResolvedFilterInfo; +import org.apache.carbondata.core.scan.filter.resolver.resolverinfo.MeasureColumnResolvedFilterInfo; +import org.apache.carbondata.core.scan.processor.RawBlockletColumnChunks; +import org.apache.carbondata.core.util.DataTypeUtil; +import org.apache.carbondata.geo.scan.expression.PolygonExpression; + +import org.apache.log4j.Logger; + +public class PolygonFilterExecutorImpl extends RowLevelFilterExecuterImpl { + public PolygonFilterExecutorImpl(List<DimColumnResolvedFilterInfo> dimColEvaluatorInfoList, + List<MeasureColumnResolvedFilterInfo> msrColEvalutorInfoList, Expression exp, + AbsoluteTableIdentifier tableIdentifier, SegmentProperties segmentProperties, + Map<Integer, GenericQueryType> complexDimensionInfoMap) { + super(dimColEvaluatorInfoList, msrColEvalutorInfoList, exp, tableIdentifier, segmentProperties, + complexDimensionInfoMap); + } + + private int getNearestRangeIndex(List<Long[]> ranges, long searchForNumber) { + Long[] range; + int low = 0, mid = 0, high = ranges.size() - 1; + while (low <= high) { + mid = low + ((high - low) / 2); + range = ranges.get(mid); + if (searchForNumber >= range[0]) { + if (searchForNumber <= range[1]) { + // Return the range index if the number is between min and max values of the range + return mid; + } else { + // Number is bigger than this range's min and max. Search on the right side of the range + low = mid + 1; + } + } else { + // Number is smaller than this range's min and max. Search on the left side of the range + high = mid - 1; + } + } + return mid; + } + + private boolean isScanRequired(byte[] maxValue, byte[] minValue) { Review comment: Added method header and comments to code ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3772: URL: https://github.com/apache/carbondata/pull/3772#discussion_r440386464 ########## File path: core/src/main/java/org/apache/carbondata/core/scan/filter/FilterUtil.java ########## @@ -188,6 +189,14 @@ private static FilterExecuter createFilterExecuterTree( return new FalseFilterExecutor(); case ROWLEVEL: default: + if (filterExpressionResolverTree.getFilterExpression() instanceof UnknownExpression) { Review comment: In the initial version of the feature PR, polygon expression and executor type were defined explitily for polygon filter in carbondata-core module itself. But Ravi's review comment suggested the use of `UnknownExpression`(similar to `SparkUnknownExpression`) and have carbondata-geo depends on carbondata-core. But not the other way round. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3772: URL: https://github.com/apache/carbondata/pull/3772#discussion_r440386464 ########## File path: core/src/main/java/org/apache/carbondata/core/scan/filter/FilterUtil.java ########## @@ -188,6 +189,14 @@ private static FilterExecuter createFilterExecuterTree( return new FalseFilterExecutor(); case ROWLEVEL: default: + if (filterExpressionResolverTree.getFilterExpression() instanceof UnknownExpression) { Review comment: In the initial version of the feature PR, polygon expression and executor type were defined explitily for polygon filter in carbondata-core module itself. But Ravi's review comment suggested the use of `UnknownExpression`(similar to `SparkUnknownExpression`) and have carbondata-geo depends on carbondata-core. But not the other way round. Currently `UnknownExpression` always uses `RowLevelFilterExecuterImpl` and applies filter for each row without pruning(i.e., no block, blocklet and page pruning). So have enhanced the `UnknownExpression` with pruning ability. ########## File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonExpression.java ########## @@ -46,15 +51,16 @@ private CustomIndex<List<Long[]>> instance; private List<Long[]> ranges = new ArrayList<Long[]>(); private ColumnExpression column; - private ExpressionResult trueExpRes; - private ExpressionResult falseExpRes; + private static final ExpressionResult trueExpRes = Review comment: Have replied for it the immediate below comment. Please let me know if you have any other opinion. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3772: URL: https://github.com/apache/carbondata/pull/3772#discussion_r440394383 ########## File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonExpression.java ########## @@ -46,15 +51,16 @@ private CustomIndex<List<Long[]>> instance; private List<Long[]> ranges = new ArrayList<Long[]>(); private ColumnExpression column; - private ExpressionResult trueExpRes; - private ExpressionResult falseExpRes; + private static final ExpressionResult trueExpRes = Review comment: Have replied in the immediate below comment. Please let me know if you have any other opinion. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3772: URL: https://github.com/apache/carbondata/pull/3772#issuecomment-644391323 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1430/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3772: URL: https://github.com/apache/carbondata/pull/3772#issuecomment-644391643 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3154/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |