[GitHub] carbondata pull request #2820: [CARBONDATA-3013] Added support for pruning p...

classic Classic list List threaded Threaded
63 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2820: [CARBONDATA-3013] Added support for pruning p...

qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2820#discussion_r226863906
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFilterScanner.java ---
    @@ -316,4 +320,167 @@ private BlockletScannedResult executeFilter(RawBlockletColumnChunks rawBlockletC
             readTime.getCount() + dimensionReadTime);
         return scannedResult;
       }
    +
    +  /**
    +   * This method will process the data in below order
    +   * 1. first apply min max on the filter tree and check whether any of the filter
    +   * is fall on the range of min max, if not then return empty result
    +   * 2. If filter falls on min max range then apply filter on actual
    +   * data and get the pruned pages.
    +   * 3. if pruned pages are not empty then read only those blocks(measure or dimension)
    +   * which was present in the query but not present in the filter, as while applying filter
    +   * some of the blocks where already read and present in chunk holder so not need to
    +   * read those blocks again, this is to avoid reading of same blocks which was already read
    +   * 4. Set the blocks and filter pages to scanned result
    +   *
    +   * @param rawBlockletColumnChunks blocklet raw chunk of all columns
    +   * @throws FilterUnsupportedException
    +   */
    +  private BlockletScannedResult executeFilterForPages(
    +      RawBlockletColumnChunks rawBlockletColumnChunks)
    +      throws FilterUnsupportedException, IOException {
    +    long startTime = System.currentTimeMillis();
    +    QueryStatistic totalBlockletStatistic = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +        .get(QueryStatisticsConstants.TOTAL_BLOCKLET_NUM);
    +    totalBlockletStatistic.addCountStatistic(QueryStatisticsConstants.TOTAL_BLOCKLET_NUM,
    +        totalBlockletStatistic.getCount() + 1);
    +    // apply filter on actual data, for each page
    +    BitSet pages = this.filterExecuter.prunePages(rawBlockletColumnChunks);
    +    // if filter result is empty then return with empty result
    +    if (pages.isEmpty()) {
    +      CarbonUtil.freeMemory(rawBlockletColumnChunks.getDimensionRawColumnChunks(),
    +          rawBlockletColumnChunks.getMeasureRawColumnChunks());
    +
    +      QueryStatistic scanTime = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +          .get(QueryStatisticsConstants.SCAN_BLOCKlET_TIME);
    +      scanTime.addCountStatistic(QueryStatisticsConstants.SCAN_BLOCKlET_TIME,
    +          scanTime.getCount() + (System.currentTimeMillis() - startTime));
    +
    +      QueryStatistic scannedPages = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +          .get(QueryStatisticsConstants.PAGE_SCANNED);
    +      scannedPages.addCountStatistic(QueryStatisticsConstants.PAGE_SCANNED,
    +          scannedPages.getCount());
    +      return createEmptyResult();
    +    }
    +
    +    BlockletScannedResult scannedResult =
    +        new FilterQueryScannedResult(blockExecutionInfo, queryStatisticsModel);
    +
    +    // valid scanned blocklet
    +    QueryStatistic validScannedBlockletStatistic = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +        .get(QueryStatisticsConstants.VALID_SCAN_BLOCKLET_NUM);
    +    validScannedBlockletStatistic
    +        .addCountStatistic(QueryStatisticsConstants.VALID_SCAN_BLOCKLET_NUM,
    +            validScannedBlockletStatistic.getCount() + 1);
    +    // adding statistics for valid number of pages
    +    QueryStatistic validPages = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +        .get(QueryStatisticsConstants.VALID_PAGE_SCANNED);
    +    validPages.addCountStatistic(QueryStatisticsConstants.VALID_PAGE_SCANNED,
    +        validPages.getCount() + pages.cardinality());
    +    QueryStatistic scannedPages = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +        .get(QueryStatisticsConstants.PAGE_SCANNED);
    +    scannedPages.addCountStatistic(QueryStatisticsConstants.PAGE_SCANNED,
    +        scannedPages.getCount() + pages.cardinality());
    +    // get the row indexes from bit set for each page
    +    int[] pageFilteredPages = new int[pages.cardinality()];
    +    int index = 0;
    +    for (int i = pages.nextSetBit(0); i >= 0; i = pages.nextSetBit(i + 1)) {
    +      pageFilteredPages[index++] = i;
    +    }
    +    // count(*)  case there would not be any dimensions are measures selected.
    +    int[] numberOfRows = new int[pages.cardinality()];
    +    for (int i = 0; i < numberOfRows.length; i++) {
    +      numberOfRows[i] = rawBlockletColumnChunks.getDataBlock().getPageRowCount(i);
    +    }
    +    long dimensionReadTime = System.currentTimeMillis();
    +    dimensionReadTime = System.currentTimeMillis() - dimensionReadTime;
    +
    --- End diff --
   
    ok


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2820: [CARBONDATA-3013] Added support for pruning p...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2820#discussion_r226866881
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelRangeGrtThanFiterExecuterImpl.java ---
    @@ -148,6 +148,61 @@ private void ifDefaultValueMatchesFilter() {
         return bitSet;
       }
     
    +  @Override
    +  public BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks)
    +      throws FilterUnsupportedException, IOException {
    --- End diff --
   
    Yes, lot of code is duplicated across all range filters, maybe we should combine some of the classes. We can do this refactoring in another PR.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2820: [CARBONDATA-3013] Added support for pruning pages fo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2820
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/886/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2820: [CARBONDATA-3013] Added support for pruning pages fo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2820
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9151/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2820: [CARBONDATA-3013] Added support for pruning pages fo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2820
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1084/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2820: [CARBONDATA-3013] Added support for pruning p...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2820#discussion_r226905048
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/ExcludeFilterExecuterImpl.java ---
    @@ -143,6 +144,40 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
         return null;
       }
     
    +  @Override
    +  public BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks)
    +      throws FilterUnsupportedException, IOException {
    +    if (isDimensionPresentInCurrentBlock) {
    +      int chunkIndex = segmentProperties.getDimensionOrdinalToChunkMapping()
    +          .get(dimColEvaluatorInfo.getColumnIndex());
    +      if (null == rawBlockletColumnChunks.getDimensionRawColumnChunks()[chunkIndex]) {
    --- End diff --
   
    May we can take note of this point and add the page count in the blocklet metadata to avoid reading of dimension chunks for Exclude filter case


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2820: [CARBONDATA-3013] Added support for pruning p...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2820#discussion_r226905066
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/ExcludeFilterExecuterImpl.java ---
    @@ -143,6 +144,40 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
         return null;
       }
     
    +  @Override
    +  public BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks)
    +      throws FilterUnsupportedException, IOException {
    +    if (isDimensionPresentInCurrentBlock) {
    +      int chunkIndex = segmentProperties.getDimensionOrdinalToChunkMapping()
    +          .get(dimColEvaluatorInfo.getColumnIndex());
    +      if (null == rawBlockletColumnChunks.getDimensionRawColumnChunks()[chunkIndex]) {
    --- End diff --
   
    May we can take note of this point and add the page count in the blocklet metadata to avoid reading of dimension chunks for Exclude filter case


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2820: [CARBONDATA-3013] Added support for pruning p...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2820#discussion_r226905099
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/ExcludeFilterExecuterImpl.java ---
    @@ -143,6 +144,40 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
         return null;
       }
     
    +  @Override
    +  public BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks)
    +      throws FilterUnsupportedException, IOException {
    +    if (isDimensionPresentInCurrentBlock) {
    +      int chunkIndex = segmentProperties.getDimensionOrdinalToChunkMapping()
    +          .get(dimColEvaluatorInfo.getColumnIndex());
    +      if (null == rawBlockletColumnChunks.getDimensionRawColumnChunks()[chunkIndex]) {
    --- End diff --
   
    May we can take note of this point and add the page count in the blocklet metadata to avoid reading of dimension chunks for Exclude filter case


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2820: [CARBONDATA-3013] Added support for pruning pages fo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2820
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/899/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2820: [CARBONDATA-3013] Added support for pruning pages fo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2820
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/903/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2820: [CARBONDATA-3013] Added support for pruning pages fo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2820
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9167/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2820: [CARBONDATA-3013] Added support for pruning pages fo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2820
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/910/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2820: [CARBONDATA-3013] Added support for pruning pages fo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2820
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/917/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2820: [CARBONDATA-3013] Added support for pruning pages fo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2820
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9177/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2820: [CARBONDATA-3013] Added support for pruning pages fo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2820
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1117/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2820: [CARBONDATA-3013] Added support for pruning p...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2820#discussion_r226973701
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RestructureEvaluatorImpl.java ---
    @@ -104,6 +108,12 @@ protected boolean isDimensionDefaultValuePresentInFilterValues(
         return isDefaultValuePresentInFilterValues;
       }
     
    +  @Override
    +  public BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks)
    +      throws FilterUnsupportedException, IOException {
    +    return new BitSet();
    --- End diff --
   
    I think for this operation we need to throw `FilterUnsupportedException` similar to applyFilter implementation


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2820: [CARBONDATA-3013] Added support for pruning p...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2820#discussion_r226973940
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelRangeGrtrThanEquaToFilterExecuterImpl.java ---
    @@ -331,6 +319,80 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
         }
       }
     
    +  private boolean isScanRequired(DimensionRawColumnChunk rawColumnChunk, int i) {
    --- End diff --
   
    Change `i` to `columnIndex`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2820: [CARBONDATA-3013] Added support for pruning p...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2820#discussion_r226974056
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFilterScanner.java ---
    @@ -98,7 +98,11 @@ public BlockletFilterScanner(BlockExecutionInfo blockExecutionInfo,
       @Override
       public BlockletScannedResult scanBlocklet(RawBlockletColumnChunks rawBlockletColumnChunks)
           throws IOException, FilterUnsupportedException {
    -    return executeFilter(rawBlockletColumnChunks);
    +    if (blockExecutionInfo.isDirectVectorFill()) {
    +      return executeFilterForPages(rawBlockletColumnChunks);
    +    } else {
    +      return executeFilter(rawBlockletColumnChunks);
    --- End diff --
   
    As per the design I think we should follow the below hierarchy
    `prune block -> prune blocklet -> prune pages -> prune rows (if row filtering is enabled)`
    With current implementation we have 2 branches after `prune blocklet -> prune pages and prune rows` in parallel based on directVectorFill configuration. The effort to correct the design will be more so I think we can raise a jira to track the issue and correct it in near future


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2820: [CARBONDATA-3013] Added support for pruning p...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2820#discussion_r226974300
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFilterScanner.java ---
    @@ -316,4 +320,164 @@ private BlockletScannedResult executeFilter(RawBlockletColumnChunks rawBlockletC
             readTime.getCount() + dimensionReadTime);
         return scannedResult;
       }
    +
    +  /**
    +   * This method will process the data in below order
    +   * 1. first apply min max on the filter tree and check whether any of the filter
    +   * is fall on the range of min max, if not then return empty result
    +   * 2. If filter falls on min max range then apply filter on actual
    +   * data and get the pruned pages.
    +   * 3. if pruned pages are not empty then read only those blocks(measure or dimension)
    +   * which was present in the query but not present in the filter, as while applying filter
    +   * some of the blocks where already read and present in chunk holder so not need to
    +   * read those blocks again, this is to avoid reading of same blocks which was already read
    +   * 4. Set the blocks and filter pages to scanned result
    +   *
    +   * @param rawBlockletColumnChunks blocklet raw chunk of all columns
    +   * @throws FilterUnsupportedException
    +   */
    +  private BlockletScannedResult executeFilterForPages(
    +      RawBlockletColumnChunks rawBlockletColumnChunks)
    +      throws FilterUnsupportedException, IOException {
    +    long startTime = System.currentTimeMillis();
    +    QueryStatistic totalBlockletStatistic = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +        .get(QueryStatisticsConstants.TOTAL_BLOCKLET_NUM);
    +    totalBlockletStatistic.addCountStatistic(QueryStatisticsConstants.TOTAL_BLOCKLET_NUM,
    +        totalBlockletStatistic.getCount() + 1);
    +    // apply filter on actual data, for each page
    +    BitSet pages = this.filterExecuter.prunePages(rawBlockletColumnChunks);
    +    // if filter result is empty then return with empty result
    +    if (pages.isEmpty()) {
    +      CarbonUtil.freeMemory(rawBlockletColumnChunks.getDimensionRawColumnChunks(),
    +          rawBlockletColumnChunks.getMeasureRawColumnChunks());
    +
    +      QueryStatistic scanTime = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +          .get(QueryStatisticsConstants.SCAN_BLOCKlET_TIME);
    +      scanTime.addCountStatistic(QueryStatisticsConstants.SCAN_BLOCKlET_TIME,
    +          scanTime.getCount() + (System.currentTimeMillis() - startTime));
    +
    +      QueryStatistic scannedPages = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +          .get(QueryStatisticsConstants.PAGE_SCANNED);
    +      scannedPages.addCountStatistic(QueryStatisticsConstants.PAGE_SCANNED,
    +          scannedPages.getCount());
    +      return createEmptyResult();
    +    }
    +
    +    BlockletScannedResult scannedResult =
    +        new FilterQueryScannedResult(blockExecutionInfo, queryStatisticsModel);
    +
    +    // valid scanned blocklet
    +    QueryStatistic validScannedBlockletStatistic = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +        .get(QueryStatisticsConstants.VALID_SCAN_BLOCKLET_NUM);
    +    validScannedBlockletStatistic
    +        .addCountStatistic(QueryStatisticsConstants.VALID_SCAN_BLOCKLET_NUM,
    +            validScannedBlockletStatistic.getCount() + 1);
    +    // adding statistics for valid number of pages
    +    QueryStatistic validPages = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +        .get(QueryStatisticsConstants.VALID_PAGE_SCANNED);
    +    validPages.addCountStatistic(QueryStatisticsConstants.VALID_PAGE_SCANNED,
    +        validPages.getCount() + pages.cardinality());
    +    QueryStatistic scannedPages = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +        .get(QueryStatisticsConstants.PAGE_SCANNED);
    +    scannedPages.addCountStatistic(QueryStatisticsConstants.PAGE_SCANNED,
    +        scannedPages.getCount() + pages.cardinality());
    +    // get the row indexes from bit set for each page
    +    int[] pageFilteredPages = new int[pages.cardinality()];
    +    int index = 0;
    +    for (int i = pages.nextSetBit(0); i >= 0; i = pages.nextSetBit(i + 1)) {
    +      pageFilteredPages[index++] = i;
    +    }
    +    // count(*)  case there would not be any dimensions are measures selected.
    +    int[] numberOfRows = new int[pages.cardinality()];
    +    for (int i = 0; i < numberOfRows.length; i++) {
    +      numberOfRows[i] = rawBlockletColumnChunks.getDataBlock().getPageRowCount(i);
    +    }
    +    long dimensionReadTime = System.currentTimeMillis();
    +    dimensionReadTime = System.currentTimeMillis() - dimensionReadTime;
    --- End diff --
   
    `dimensionReadTime` is not at the correct place compute this time properly


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2820: [CARBONDATA-3013] Added support for pruning pages fo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2820
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/929/



---
1234