Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2820#discussion_r226863906 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFilterScanner.java --- @@ -316,4 +320,167 @@ private BlockletScannedResult executeFilter(RawBlockletColumnChunks rawBlockletC readTime.getCount() + dimensionReadTime); return scannedResult; } + + /** + * This method will process the data in below order + * 1. first apply min max on the filter tree and check whether any of the filter + * is fall on the range of min max, if not then return empty result + * 2. If filter falls on min max range then apply filter on actual + * data and get the pruned pages. + * 3. if pruned pages are not empty then read only those blocks(measure or dimension) + * which was present in the query but not present in the filter, as while applying filter + * some of the blocks where already read and present in chunk holder so not need to + * read those blocks again, this is to avoid reading of same blocks which was already read + * 4. Set the blocks and filter pages to scanned result + * + * @param rawBlockletColumnChunks blocklet raw chunk of all columns + * @throws FilterUnsupportedException + */ + private BlockletScannedResult executeFilterForPages( + RawBlockletColumnChunks rawBlockletColumnChunks) + throws FilterUnsupportedException, IOException { + long startTime = System.currentTimeMillis(); + QueryStatistic totalBlockletStatistic = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.TOTAL_BLOCKLET_NUM); + totalBlockletStatistic.addCountStatistic(QueryStatisticsConstants.TOTAL_BLOCKLET_NUM, + totalBlockletStatistic.getCount() + 1); + // apply filter on actual data, for each page + BitSet pages = this.filterExecuter.prunePages(rawBlockletColumnChunks); + // if filter result is empty then return with empty result + if (pages.isEmpty()) { + CarbonUtil.freeMemory(rawBlockletColumnChunks.getDimensionRawColumnChunks(), + rawBlockletColumnChunks.getMeasureRawColumnChunks()); + + QueryStatistic scanTime = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.SCAN_BLOCKlET_TIME); + scanTime.addCountStatistic(QueryStatisticsConstants.SCAN_BLOCKlET_TIME, + scanTime.getCount() + (System.currentTimeMillis() - startTime)); + + QueryStatistic scannedPages = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.PAGE_SCANNED); + scannedPages.addCountStatistic(QueryStatisticsConstants.PAGE_SCANNED, + scannedPages.getCount()); + return createEmptyResult(); + } + + BlockletScannedResult scannedResult = + new FilterQueryScannedResult(blockExecutionInfo, queryStatisticsModel); + + // valid scanned blocklet + QueryStatistic validScannedBlockletStatistic = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.VALID_SCAN_BLOCKLET_NUM); + validScannedBlockletStatistic + .addCountStatistic(QueryStatisticsConstants.VALID_SCAN_BLOCKLET_NUM, + validScannedBlockletStatistic.getCount() + 1); + // adding statistics for valid number of pages + QueryStatistic validPages = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.VALID_PAGE_SCANNED); + validPages.addCountStatistic(QueryStatisticsConstants.VALID_PAGE_SCANNED, + validPages.getCount() + pages.cardinality()); + QueryStatistic scannedPages = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.PAGE_SCANNED); + scannedPages.addCountStatistic(QueryStatisticsConstants.PAGE_SCANNED, + scannedPages.getCount() + pages.cardinality()); + // get the row indexes from bit set for each page + int[] pageFilteredPages = new int[pages.cardinality()]; + int index = 0; + for (int i = pages.nextSetBit(0); i >= 0; i = pages.nextSetBit(i + 1)) { + pageFilteredPages[index++] = i; + } + // count(*) case there would not be any dimensions are measures selected. + int[] numberOfRows = new int[pages.cardinality()]; + for (int i = 0; i < numberOfRows.length; i++) { + numberOfRows[i] = rawBlockletColumnChunks.getDataBlock().getPageRowCount(i); + } + long dimensionReadTime = System.currentTimeMillis(); + dimensionReadTime = System.currentTimeMillis() - dimensionReadTime; + --- End diff -- ok --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2820#discussion_r226866881 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelRangeGrtThanFiterExecuterImpl.java --- @@ -148,6 +148,61 @@ private void ifDefaultValueMatchesFilter() { return bitSet; } + @Override + public BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks) + throws FilterUnsupportedException, IOException { --- End diff -- Yes, lot of code is duplicated across all range filters, maybe we should combine some of the classes. We can do this refactoring in another PR. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2820 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/886/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2820 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9151/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2820 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1084/ --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2820#discussion_r226905048 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/ExcludeFilterExecuterImpl.java --- @@ -143,6 +144,40 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks, return null; } + @Override + public BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks) + throws FilterUnsupportedException, IOException { + if (isDimensionPresentInCurrentBlock) { + int chunkIndex = segmentProperties.getDimensionOrdinalToChunkMapping() + .get(dimColEvaluatorInfo.getColumnIndex()); + if (null == rawBlockletColumnChunks.getDimensionRawColumnChunks()[chunkIndex]) { --- End diff -- May we can take note of this point and add the page count in the blocklet metadata to avoid reading of dimension chunks for Exclude filter case --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2820#discussion_r226905066 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/ExcludeFilterExecuterImpl.java --- @@ -143,6 +144,40 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks, return null; } + @Override + public BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks) + throws FilterUnsupportedException, IOException { + if (isDimensionPresentInCurrentBlock) { + int chunkIndex = segmentProperties.getDimensionOrdinalToChunkMapping() + .get(dimColEvaluatorInfo.getColumnIndex()); + if (null == rawBlockletColumnChunks.getDimensionRawColumnChunks()[chunkIndex]) { --- End diff -- May we can take note of this point and add the page count in the blocklet metadata to avoid reading of dimension chunks for Exclude filter case --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2820#discussion_r226905099 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/ExcludeFilterExecuterImpl.java --- @@ -143,6 +144,40 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks, return null; } + @Override + public BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks) + throws FilterUnsupportedException, IOException { + if (isDimensionPresentInCurrentBlock) { + int chunkIndex = segmentProperties.getDimensionOrdinalToChunkMapping() + .get(dimColEvaluatorInfo.getColumnIndex()); + if (null == rawBlockletColumnChunks.getDimensionRawColumnChunks()[chunkIndex]) { --- End diff -- May we can take note of this point and add the page count in the blocklet metadata to avoid reading of dimension chunks for Exclude filter case --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2820 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/899/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2820 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/903/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2820 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9167/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2820 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/910/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2820 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/917/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2820 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9177/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2820 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1117/ --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2820#discussion_r226973701 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RestructureEvaluatorImpl.java --- @@ -104,6 +108,12 @@ protected boolean isDimensionDefaultValuePresentInFilterValues( return isDefaultValuePresentInFilterValues; } + @Override + public BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks) + throws FilterUnsupportedException, IOException { + return new BitSet(); --- End diff -- I think for this operation we need to throw `FilterUnsupportedException` similar to applyFilter implementation --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2820#discussion_r226973940 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelRangeGrtrThanEquaToFilterExecuterImpl.java --- @@ -331,6 +319,80 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks, } } + private boolean isScanRequired(DimensionRawColumnChunk rawColumnChunk, int i) { --- End diff -- Change `i` to `columnIndex` --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2820#discussion_r226974056 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFilterScanner.java --- @@ -98,7 +98,11 @@ public BlockletFilterScanner(BlockExecutionInfo blockExecutionInfo, @Override public BlockletScannedResult scanBlocklet(RawBlockletColumnChunks rawBlockletColumnChunks) throws IOException, FilterUnsupportedException { - return executeFilter(rawBlockletColumnChunks); + if (blockExecutionInfo.isDirectVectorFill()) { + return executeFilterForPages(rawBlockletColumnChunks); + } else { + return executeFilter(rawBlockletColumnChunks); --- End diff -- As per the design I think we should follow the below hierarchy `prune block -> prune blocklet -> prune pages -> prune rows (if row filtering is enabled)` With current implementation we have 2 branches after `prune blocklet -> prune pages and prune rows` in parallel based on directVectorFill configuration. The effort to correct the design will be more so I think we can raise a jira to track the issue and correct it in near future --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2820#discussion_r226974300 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFilterScanner.java --- @@ -316,4 +320,164 @@ private BlockletScannedResult executeFilter(RawBlockletColumnChunks rawBlockletC readTime.getCount() + dimensionReadTime); return scannedResult; } + + /** + * This method will process the data in below order + * 1. first apply min max on the filter tree and check whether any of the filter + * is fall on the range of min max, if not then return empty result + * 2. If filter falls on min max range then apply filter on actual + * data and get the pruned pages. + * 3. if pruned pages are not empty then read only those blocks(measure or dimension) + * which was present in the query but not present in the filter, as while applying filter + * some of the blocks where already read and present in chunk holder so not need to + * read those blocks again, this is to avoid reading of same blocks which was already read + * 4. Set the blocks and filter pages to scanned result + * + * @param rawBlockletColumnChunks blocklet raw chunk of all columns + * @throws FilterUnsupportedException + */ + private BlockletScannedResult executeFilterForPages( + RawBlockletColumnChunks rawBlockletColumnChunks) + throws FilterUnsupportedException, IOException { + long startTime = System.currentTimeMillis(); + QueryStatistic totalBlockletStatistic = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.TOTAL_BLOCKLET_NUM); + totalBlockletStatistic.addCountStatistic(QueryStatisticsConstants.TOTAL_BLOCKLET_NUM, + totalBlockletStatistic.getCount() + 1); + // apply filter on actual data, for each page + BitSet pages = this.filterExecuter.prunePages(rawBlockletColumnChunks); + // if filter result is empty then return with empty result + if (pages.isEmpty()) { + CarbonUtil.freeMemory(rawBlockletColumnChunks.getDimensionRawColumnChunks(), + rawBlockletColumnChunks.getMeasureRawColumnChunks()); + + QueryStatistic scanTime = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.SCAN_BLOCKlET_TIME); + scanTime.addCountStatistic(QueryStatisticsConstants.SCAN_BLOCKlET_TIME, + scanTime.getCount() + (System.currentTimeMillis() - startTime)); + + QueryStatistic scannedPages = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.PAGE_SCANNED); + scannedPages.addCountStatistic(QueryStatisticsConstants.PAGE_SCANNED, + scannedPages.getCount()); + return createEmptyResult(); + } + + BlockletScannedResult scannedResult = + new FilterQueryScannedResult(blockExecutionInfo, queryStatisticsModel); + + // valid scanned blocklet + QueryStatistic validScannedBlockletStatistic = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.VALID_SCAN_BLOCKLET_NUM); + validScannedBlockletStatistic + .addCountStatistic(QueryStatisticsConstants.VALID_SCAN_BLOCKLET_NUM, + validScannedBlockletStatistic.getCount() + 1); + // adding statistics for valid number of pages + QueryStatistic validPages = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.VALID_PAGE_SCANNED); + validPages.addCountStatistic(QueryStatisticsConstants.VALID_PAGE_SCANNED, + validPages.getCount() + pages.cardinality()); + QueryStatistic scannedPages = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.PAGE_SCANNED); + scannedPages.addCountStatistic(QueryStatisticsConstants.PAGE_SCANNED, + scannedPages.getCount() + pages.cardinality()); + // get the row indexes from bit set for each page + int[] pageFilteredPages = new int[pages.cardinality()]; + int index = 0; + for (int i = pages.nextSetBit(0); i >= 0; i = pages.nextSetBit(i + 1)) { + pageFilteredPages[index++] = i; + } + // count(*) case there would not be any dimensions are measures selected. + int[] numberOfRows = new int[pages.cardinality()]; + for (int i = 0; i < numberOfRows.length; i++) { + numberOfRows[i] = rawBlockletColumnChunks.getDataBlock().getPageRowCount(i); + } + long dimensionReadTime = System.currentTimeMillis(); + dimensionReadTime = System.currentTimeMillis() - dimensionReadTime; --- End diff -- `dimensionReadTime` is not at the correct place compute this time properly --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2820 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/929/ --- |
Free forum by Nabble | Edit this page |