GitHub user manishgupta88 opened a pull request:
https://github.com/apache/carbondata/pull/2467 [WIP] [CARBONDATA-2649] Add code for caching min/max only for specified columns Things done as part of this PR 1. Supported configuring column for caching min/max in driver 2. Added test cases for query verification for COLUMN_META_CACHE and CACHE_LEVEL properties 3. Handled comments for PR #2454 This PR is dependent on PR #2454 - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? Yes. Will be done as part of jira-2651 - [ ] Testing done In Progress - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishgupta88/carbondata configurable_cache_columns Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2467.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2467 ---- commit 30c96944fa11a63cabfcecbd1fd45643c68106f2 Author: manishgupta88 <tomanishgupta18@...> Date: 2018-07-04T15:30:54Z Refactor Block and Blocklet DataMap to store only segmentProeprties Index instead of segmentProperties commit b7a1c24a284b56e7bb95029c37256abadea5cbd7 Author: manishgupta88 <tomanishgupta18@...> Date: 2018-07-08T08:54:21Z supported configuring column for min/max caching in driver ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2467 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5733/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2467 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6958/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2467 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6964/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2467 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6969/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2467 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5744/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2467 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5726/ --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2467#discussion_r201242748 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java --- @@ -289,20 +293,24 @@ public SegmentProperties getSegmentProperties() { return columnCardinality; } - public CarbonRowSchema[] getBlockSchema() { - return SchemaGenerator.createBlockSchema(segmentProperties); + public synchronized CarbonRowSchema[] getTaskSummarySchema() { + return taskSummarySchema; } - public CarbonRowSchema[] getBlocketSchema() { - return SchemaGenerator.createBlockletSchema(segmentProperties); + public synchronized void setTaskSummarySchema(CarbonRowSchema[] taskSummarySchema) { --- End diff -- I think synchronized is no required here , please check --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2467#discussion_r201292476 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/FilterUtil.java --- @@ -251,6 +273,137 @@ private static FilterExecuter getIncludeFilterExecuter( } } + /** + * check if current need to be replaced with TrueFilter expression. This will happen in case + * filter column min/max is not cached in the driver + * + * @param dimColEvaluatorInfoList + * @param msrColEvaluatorInfoList + * @param segmentProperties + * @param minMaxCacheColumns + * @return + */ + private static boolean checkIfCurrentNodeToBeReplacedWithTrueFilterExpression( + List<DimColumnResolvedFilterInfo> dimColEvaluatorInfoList, + List<MeasureColumnResolvedFilterInfo> msrColEvaluatorInfoList, + SegmentProperties segmentProperties, List<CarbonColumn> minMaxCacheColumns) { + boolean replaceCurrentNodeWithTrueFilter = false; + ColumnResolvedFilterInfo columnResolvedFilterInfo = null; + if (!msrColEvaluatorInfoList.isEmpty()) { + columnResolvedFilterInfo = msrColEvaluatorInfoList.get(0); + replaceCurrentNodeWithTrueFilter = + checkIfFilterColumnIsCachedInDriver(columnResolvedFilterInfo, segmentProperties, + minMaxCacheColumns, true); + } else { + columnResolvedFilterInfo = dimColEvaluatorInfoList.get(0); + if (!columnResolvedFilterInfo.getDimension().hasEncoding(Encoding.IMPLICIT)) { + replaceCurrentNodeWithTrueFilter = + checkIfFilterColumnIsCachedInDriver(columnResolvedFilterInfo, segmentProperties, + minMaxCacheColumns, false); + } + } + return replaceCurrentNodeWithTrueFilter; + } + + /** + * check if current need to be replaced with TrueFilter expression. This will happen in case + * filter column min/max is not cached in the driver + * + * @param filterExpressionResolverTree + * @param segmentProperties + * @param minMaxCacheColumns + * @return + */ + private static boolean checkIfCurrentNodeToBeReplacedWithTrueFilterExpression( + FilterResolverIntf filterExpressionResolverTree, SegmentProperties segmentProperties, + List<CarbonColumn> minMaxCacheColumns) { + boolean replaceCurrentNodeWithTrueFilter = false; + ColumnResolvedFilterInfo columnResolvedFilterInfo = null; + if (null != filterExpressionResolverTree.getMsrColResolvedFilterInfo()) { + columnResolvedFilterInfo = filterExpressionResolverTree.getMsrColResolvedFilterInfo(); + replaceCurrentNodeWithTrueFilter = + checkIfFilterColumnIsCachedInDriver(columnResolvedFilterInfo, segmentProperties, + minMaxCacheColumns, true); + } else { + columnResolvedFilterInfo = filterExpressionResolverTree.getDimColResolvedFilterInfo(); + if (!columnResolvedFilterInfo.getDimension().hasEncoding(Encoding.IMPLICIT)) { + replaceCurrentNodeWithTrueFilter = + checkIfFilterColumnIsCachedInDriver(columnResolvedFilterInfo, segmentProperties, + minMaxCacheColumns, false); + } + } + return replaceCurrentNodeWithTrueFilter; + } + + /** + * Method to check whether current node needs to be replaced with true filter to avoid pruning + * for case when filter column is not cached in the min/max cached dimension + * + * @param columnResolvedFilterInfo + * @param segmentProperties + * @param minMaxCacheColumns + * @param isMeasure + * @return + */ + private static boolean checkIfFilterColumnIsCachedInDriver( + ColumnResolvedFilterInfo columnResolvedFilterInfo, SegmentProperties segmentProperties, + List<CarbonColumn> minMaxCacheColumns, boolean isMeasure) { + boolean replaceCurrentNodeWithTrueFilter = false; + CarbonColumn columnFromCurrentBlock = null; + if (isMeasure) { + columnFromCurrentBlock = segmentProperties + .getMeasureFromCurrentBlock(columnResolvedFilterInfo.getMeasure().getColumnId()); + } else { + columnFromCurrentBlock = + segmentProperties.getDimensionFromCurrentBlock(columnResolvedFilterInfo.getDimension()); + } + if (null != columnFromCurrentBlock) { + // check for filter dimension in the cached column list + if (null != minMaxCacheColumns) { + int columnIndexInMinMaxByteArray = + getFilterDimensionIndexInCachedColumns(minMaxCacheColumns, columnFromCurrentBlock); + if (columnIndexInMinMaxByteArray != -1) { + columnResolvedFilterInfo.setColumnIndexInMinMaxByteArray(columnIndexInMinMaxByteArray); + } else { + // will be true only if column caching is enabled and current filter column is not cached + replaceCurrentNodeWithTrueFilter = true; + } + } else { + // if columns to be cached are not specified then in that case all columns will be cached + // and then the ordinal of column will be its index in the min/max byte array + if (isMeasure) { + columnResolvedFilterInfo.setColumnIndexInMinMaxByteArray( + segmentProperties.getLastDimensionColOrdinal() + columnFromCurrentBlock.getOrdinal()); + } else { + columnResolvedFilterInfo + .setColumnIndexInMinMaxByteArray(columnFromCurrentBlock.getOrdinal()); + } + } + } + return replaceCurrentNodeWithTrueFilter; + } + + /** + * Method to check whether the filter dimension exists in the cached dimensions for a table + * + * @param carbonDimensionsToBeCached + * @param filterColumn + * @return + */ + private static int getFilterDimensionIndexInCachedColumns( --- End diff -- Change method name as it will be called for both dimension and measure columns --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:
https://github.com/apache/carbondata/pull/2467 @manishgupta88 LGTM except 2 minor comments , please re base --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:
https://github.com/apache/carbondata/pull/2467 @kumarvishal09 ...handled review comments...kindly review and merge --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2467 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6993/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2467 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5747/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2467 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5773/ --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:
https://github.com/apache/carbondata/pull/2467 Lgtm --- |
In reply to this post by qiuchenjian-2
|
Free forum by Nabble | Edit this page |