Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2822 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9133/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2822 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1067/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2822 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/870/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2822 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/871/ --- |
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2822#discussion_r226301860 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFilterScanner.java --- @@ -316,4 +320,167 @@ private BlockletScannedResult executeFilter(RawBlockletColumnChunks rawBlockletC readTime.getCount() + dimensionReadTime); return scannedResult; } + + /** + * This method will process the data in below order + * 1. first apply min max on the filter tree and check whether any of the filter + * is fall on the range of min max, if not then return empty result + * 2. If filter falls on min max range then apply filter on actual + * data and get the pruned pages. + * 3. if pruned pages are not empty then read only those blocks(measure or dimension) + * which was present in the query but not present in the filter, as while applying filter + * some of the blocks where already read and present in chunk holder so not need to + * read those blocks again, this is to avoid reading of same blocks which was already read + * 4. Set the blocks and filter pages to scanned result + * + * @param rawBlockletColumnChunks blocklet raw chunk of all columns + * @throws FilterUnsupportedException + */ + private BlockletScannedResult executeFilterForPages( + RawBlockletColumnChunks rawBlockletColumnChunks) + throws FilterUnsupportedException, IOException { + long startTime = System.currentTimeMillis(); + QueryStatistic totalBlockletStatistic = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.TOTAL_BLOCKLET_NUM); + totalBlockletStatistic.addCountStatistic(QueryStatisticsConstants.TOTAL_BLOCKLET_NUM, + totalBlockletStatistic.getCount() + 1); + // apply filter on actual data, for each page + BitSet pages = this.filterExecuter.prunePages(rawBlockletColumnChunks); + // if filter result is empty then return with empty result + if (pages.isEmpty()) { + CarbonUtil.freeMemory(rawBlockletColumnChunks.getDimensionRawColumnChunks(), + rawBlockletColumnChunks.getMeasureRawColumnChunks()); + + QueryStatistic scanTime = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.SCAN_BLOCKlET_TIME); + scanTime.addCountStatistic(QueryStatisticsConstants.SCAN_BLOCKlET_TIME, + scanTime.getCount() + (System.currentTimeMillis() - startTime)); + + QueryStatistic scannedPages = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.PAGE_SCANNED); + scannedPages.addCountStatistic(QueryStatisticsConstants.PAGE_SCANNED, + scannedPages.getCount()); + return createEmptyResult(); + } + + BlockletScannedResult scannedResult = + new FilterQueryScannedResult(blockExecutionInfo, queryStatisticsModel); + + // valid scanned blocklet + QueryStatistic validScannedBlockletStatistic = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.VALID_SCAN_BLOCKLET_NUM); + validScannedBlockletStatistic + .addCountStatistic(QueryStatisticsConstants.VALID_SCAN_BLOCKLET_NUM, + validScannedBlockletStatistic.getCount() + 1); + // adding statistics for valid number of pages + QueryStatistic validPages = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.VALID_PAGE_SCANNED); + validPages.addCountStatistic(QueryStatisticsConstants.VALID_PAGE_SCANNED, + validPages.getCount() + pages.cardinality()); + QueryStatistic scannedPages = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.PAGE_SCANNED); + scannedPages.addCountStatistic(QueryStatisticsConstants.PAGE_SCANNED, + scannedPages.getCount() + pages.cardinality()); + // get the row indexes from bit set for each page + int[] pageFilteredPages = new int[pages.cardinality()]; + int index = 0; + for (int i = pages.nextSetBit(0); i >= 0; i = pages.nextSetBit(i + 1)) { + pageFilteredPages[index++] = i; + } + // count(*) case there would not be any dimensions are measures selected. + int[] numberOfRows = new int[pages.cardinality()]; + for (int i = 0; i < numberOfRows.length; i++) { + numberOfRows[i] = rawBlockletColumnChunks.getDataBlock().getPageRowCount(i); --- End diff -- This will fill the numberofrows for the pages incorrectly. I think it should be for (int i = pages.nextSetBit(0); i >= 0; i = pages.nextSetBit(i + 1)) { pageFilteredPages[index] = i; numberOfRows[index++] = rawBlockletColumnChunks.getDataBlock().getPageRowCount(i); } --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2822 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1069/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2822 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9136/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2822 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/872/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2822 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9137/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2822 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1070/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2822#discussion_r226830306 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFilterScanner.java --- @@ -316,4 +320,167 @@ private BlockletScannedResult executeFilter(RawBlockletColumnChunks rawBlockletC readTime.getCount() + dimensionReadTime); return scannedResult; } + + /** + * This method will process the data in below order + * 1. first apply min max on the filter tree and check whether any of the filter + * is fall on the range of min max, if not then return empty result + * 2. If filter falls on min max range then apply filter on actual + * data and get the pruned pages. + * 3. if pruned pages are not empty then read only those blocks(measure or dimension) + * which was present in the query but not present in the filter, as while applying filter + * some of the blocks where already read and present in chunk holder so not need to + * read those blocks again, this is to avoid reading of same blocks which was already read + * 4. Set the blocks and filter pages to scanned result + * + * @param rawBlockletColumnChunks blocklet raw chunk of all columns + * @throws FilterUnsupportedException + */ + private BlockletScannedResult executeFilterForPages( + RawBlockletColumnChunks rawBlockletColumnChunks) + throws FilterUnsupportedException, IOException { + long startTime = System.currentTimeMillis(); + QueryStatistic totalBlockletStatistic = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.TOTAL_BLOCKLET_NUM); + totalBlockletStatistic.addCountStatistic(QueryStatisticsConstants.TOTAL_BLOCKLET_NUM, + totalBlockletStatistic.getCount() + 1); + // apply filter on actual data, for each page + BitSet pages = this.filterExecuter.prunePages(rawBlockletColumnChunks); + // if filter result is empty then return with empty result + if (pages.isEmpty()) { + CarbonUtil.freeMemory(rawBlockletColumnChunks.getDimensionRawColumnChunks(), + rawBlockletColumnChunks.getMeasureRawColumnChunks()); + + QueryStatistic scanTime = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.SCAN_BLOCKlET_TIME); + scanTime.addCountStatistic(QueryStatisticsConstants.SCAN_BLOCKlET_TIME, + scanTime.getCount() + (System.currentTimeMillis() - startTime)); + + QueryStatistic scannedPages = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.PAGE_SCANNED); + scannedPages.addCountStatistic(QueryStatisticsConstants.PAGE_SCANNED, + scannedPages.getCount()); + return createEmptyResult(); + } + + BlockletScannedResult scannedResult = + new FilterQueryScannedResult(blockExecutionInfo, queryStatisticsModel); + + // valid scanned blocklet + QueryStatistic validScannedBlockletStatistic = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.VALID_SCAN_BLOCKLET_NUM); + validScannedBlockletStatistic + .addCountStatistic(QueryStatisticsConstants.VALID_SCAN_BLOCKLET_NUM, + validScannedBlockletStatistic.getCount() + 1); + // adding statistics for valid number of pages + QueryStatistic validPages = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.VALID_PAGE_SCANNED); + validPages.addCountStatistic(QueryStatisticsConstants.VALID_PAGE_SCANNED, + validPages.getCount() + pages.cardinality()); + QueryStatistic scannedPages = queryStatisticsModel.getStatisticsTypeAndObjMap() + .get(QueryStatisticsConstants.PAGE_SCANNED); + scannedPages.addCountStatistic(QueryStatisticsConstants.PAGE_SCANNED, + scannedPages.getCount() + pages.cardinality()); + // get the row indexes from bit set for each page + int[] pageFilteredPages = new int[pages.cardinality()]; + int index = 0; + for (int i = pages.nextSetBit(0); i >= 0; i = pages.nextSetBit(i + 1)) { + pageFilteredPages[index++] = i; + } + // count(*) case there would not be any dimensions are measures selected. + int[] numberOfRows = new int[pages.cardinality()]; + for (int i = 0; i < numberOfRows.length; i++) { + numberOfRows[i] = rawBlockletColumnChunks.getDataBlock().getPageRowCount(i); --- End diff -- ok --- |
In reply to this post by qiuchenjian-2
Github user ravipesala closed the pull request at:
https://github.com/apache/carbondata/pull/2822 --- |
In reply to this post by qiuchenjian-2
GitHub user ravipesala reopened a pull request:
https://github.com/apache/carbondata/pull/2822 [CARBONDATA-3014] Added support for inverted index and delete delta for direct scan queries This PR depends on PR https://github.com/apache/carbondata/pull/2820 Added new classes to support inverted index and delete delta directly from column vector. `ColumnarVectorWrapperDirectWithInvertedIndex` `ColumnarVectorWrapperDirectWithDeleteDelta` `ColumnarVectorWrapperDirectWithDeleteDeltaAndInvertedIndex` Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata perf-inverted-index Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2822.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2822 ---- commit 42dfd6adec741e9cf98af92e8a4c3d7810a681e8 Author: ravipesala <ravi.pesala@...> Date: 2018-10-16T05:02:18Z Add carbon property to configure vector based row pruning push down commit d9ae60c8f7b0b90d6b5a113043c5ec4cd3acf726 Author: ravipesala <ravi.pesala@...> Date: 2018-10-16T06:00:43Z Added support for full scan queries for vector direct fill. commit ff36f4b55f26732b6a669fcd2edd4e958a04818a Author: ravipesala <ravi.pesala@...> Date: 2018-10-21T13:44:11Z Fix comments commit 12878a2591795e53826f615dc54fc3d443227a41 Author: ravipesala <ravi.pesala@...> Date: 2018-10-16T09:23:14Z Added support for pruning pages for vector direct fill. commit 12bed1a2b875962a90621af6f638a41e7e3f6d4f Author: ravipesala <ravi.pesala@...> Date: 2018-10-21T15:27:50Z Fix comments commit 1b08711b3c88539267363735884884499f5586f8 Author: ravipesala <ravi.pesala@...> Date: 2018-10-16T11:07:18Z Added support for inverted index and delete delta for direct scan queries ---- --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2822 retest this please --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2822 retest this please --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2822 retest this please --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2822 retest this please --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2822 retest this please --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2822 retest this please --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2822 retest this please --- |
Free forum by Nabble | Edit this page |