[GitHub] carbondata pull request #2822: [CARBONDATA-3014] Added support for inverted ...

classic Classic list List threaded Threaded
74 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2822
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9133/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2822
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1067/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2822
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/870/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2822
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/871/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2822: [CARBONDATA-3014] Added support for inverted ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2822#discussion_r226301860
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFilterScanner.java ---
    @@ -316,4 +320,167 @@ private BlockletScannedResult executeFilter(RawBlockletColumnChunks rawBlockletC
             readTime.getCount() + dimensionReadTime);
         return scannedResult;
       }
    +
    +  /**
    +   * This method will process the data in below order
    +   * 1. first apply min max on the filter tree and check whether any of the filter
    +   * is fall on the range of min max, if not then return empty result
    +   * 2. If filter falls on min max range then apply filter on actual
    +   * data and get the pruned pages.
    +   * 3. if pruned pages are not empty then read only those blocks(measure or dimension)
    +   * which was present in the query but not present in the filter, as while applying filter
    +   * some of the blocks where already read and present in chunk holder so not need to
    +   * read those blocks again, this is to avoid reading of same blocks which was already read
    +   * 4. Set the blocks and filter pages to scanned result
    +   *
    +   * @param rawBlockletColumnChunks blocklet raw chunk of all columns
    +   * @throws FilterUnsupportedException
    +   */
    +  private BlockletScannedResult executeFilterForPages(
    +      RawBlockletColumnChunks rawBlockletColumnChunks)
    +      throws FilterUnsupportedException, IOException {
    +    long startTime = System.currentTimeMillis();
    +    QueryStatistic totalBlockletStatistic = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +        .get(QueryStatisticsConstants.TOTAL_BLOCKLET_NUM);
    +    totalBlockletStatistic.addCountStatistic(QueryStatisticsConstants.TOTAL_BLOCKLET_NUM,
    +        totalBlockletStatistic.getCount() + 1);
    +    // apply filter on actual data, for each page
    +    BitSet pages = this.filterExecuter.prunePages(rawBlockletColumnChunks);
    +    // if filter result is empty then return with empty result
    +    if (pages.isEmpty()) {
    +      CarbonUtil.freeMemory(rawBlockletColumnChunks.getDimensionRawColumnChunks(),
    +          rawBlockletColumnChunks.getMeasureRawColumnChunks());
    +
    +      QueryStatistic scanTime = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +          .get(QueryStatisticsConstants.SCAN_BLOCKlET_TIME);
    +      scanTime.addCountStatistic(QueryStatisticsConstants.SCAN_BLOCKlET_TIME,
    +          scanTime.getCount() + (System.currentTimeMillis() - startTime));
    +
    +      QueryStatistic scannedPages = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +          .get(QueryStatisticsConstants.PAGE_SCANNED);
    +      scannedPages.addCountStatistic(QueryStatisticsConstants.PAGE_SCANNED,
    +          scannedPages.getCount());
    +      return createEmptyResult();
    +    }
    +
    +    BlockletScannedResult scannedResult =
    +        new FilterQueryScannedResult(blockExecutionInfo, queryStatisticsModel);
    +
    +    // valid scanned blocklet
    +    QueryStatistic validScannedBlockletStatistic = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +        .get(QueryStatisticsConstants.VALID_SCAN_BLOCKLET_NUM);
    +    validScannedBlockletStatistic
    +        .addCountStatistic(QueryStatisticsConstants.VALID_SCAN_BLOCKLET_NUM,
    +            validScannedBlockletStatistic.getCount() + 1);
    +    // adding statistics for valid number of pages
    +    QueryStatistic validPages = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +        .get(QueryStatisticsConstants.VALID_PAGE_SCANNED);
    +    validPages.addCountStatistic(QueryStatisticsConstants.VALID_PAGE_SCANNED,
    +        validPages.getCount() + pages.cardinality());
    +    QueryStatistic scannedPages = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +        .get(QueryStatisticsConstants.PAGE_SCANNED);
    +    scannedPages.addCountStatistic(QueryStatisticsConstants.PAGE_SCANNED,
    +        scannedPages.getCount() + pages.cardinality());
    +    // get the row indexes from bit set for each page
    +    int[] pageFilteredPages = new int[pages.cardinality()];
    +    int index = 0;
    +    for (int i = pages.nextSetBit(0); i >= 0; i = pages.nextSetBit(i + 1)) {
    +      pageFilteredPages[index++] = i;
    +    }
    +    // count(*)  case there would not be any dimensions are measures selected.
    +    int[] numberOfRows = new int[pages.cardinality()];
    +    for (int i = 0; i < numberOfRows.length; i++) {
    +      numberOfRows[i] = rawBlockletColumnChunks.getDataBlock().getPageRowCount(i);
    --- End diff --
   
    This will fill the numberofrows for the pages incorrectly. I think it should be
    for (int i = pages.nextSetBit(0); i >= 0; i = pages.nextSetBit(i + 1)) {
          pageFilteredPages[index] = i;
          numberOfRows[index++] = rawBlockletColumnChunks.getDataBlock().getPageRowCount(i);
        }


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2822
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1069/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2822
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9136/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2822
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/872/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2822
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9137/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2822
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1070/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2822: [CARBONDATA-3014] Added support for inverted ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2822#discussion_r226830306
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFilterScanner.java ---
    @@ -316,4 +320,167 @@ private BlockletScannedResult executeFilter(RawBlockletColumnChunks rawBlockletC
             readTime.getCount() + dimensionReadTime);
         return scannedResult;
       }
    +
    +  /**
    +   * This method will process the data in below order
    +   * 1. first apply min max on the filter tree and check whether any of the filter
    +   * is fall on the range of min max, if not then return empty result
    +   * 2. If filter falls on min max range then apply filter on actual
    +   * data and get the pruned pages.
    +   * 3. if pruned pages are not empty then read only those blocks(measure or dimension)
    +   * which was present in the query but not present in the filter, as while applying filter
    +   * some of the blocks where already read and present in chunk holder so not need to
    +   * read those blocks again, this is to avoid reading of same blocks which was already read
    +   * 4. Set the blocks and filter pages to scanned result
    +   *
    +   * @param rawBlockletColumnChunks blocklet raw chunk of all columns
    +   * @throws FilterUnsupportedException
    +   */
    +  private BlockletScannedResult executeFilterForPages(
    +      RawBlockletColumnChunks rawBlockletColumnChunks)
    +      throws FilterUnsupportedException, IOException {
    +    long startTime = System.currentTimeMillis();
    +    QueryStatistic totalBlockletStatistic = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +        .get(QueryStatisticsConstants.TOTAL_BLOCKLET_NUM);
    +    totalBlockletStatistic.addCountStatistic(QueryStatisticsConstants.TOTAL_BLOCKLET_NUM,
    +        totalBlockletStatistic.getCount() + 1);
    +    // apply filter on actual data, for each page
    +    BitSet pages = this.filterExecuter.prunePages(rawBlockletColumnChunks);
    +    // if filter result is empty then return with empty result
    +    if (pages.isEmpty()) {
    +      CarbonUtil.freeMemory(rawBlockletColumnChunks.getDimensionRawColumnChunks(),
    +          rawBlockletColumnChunks.getMeasureRawColumnChunks());
    +
    +      QueryStatistic scanTime = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +          .get(QueryStatisticsConstants.SCAN_BLOCKlET_TIME);
    +      scanTime.addCountStatistic(QueryStatisticsConstants.SCAN_BLOCKlET_TIME,
    +          scanTime.getCount() + (System.currentTimeMillis() - startTime));
    +
    +      QueryStatistic scannedPages = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +          .get(QueryStatisticsConstants.PAGE_SCANNED);
    +      scannedPages.addCountStatistic(QueryStatisticsConstants.PAGE_SCANNED,
    +          scannedPages.getCount());
    +      return createEmptyResult();
    +    }
    +
    +    BlockletScannedResult scannedResult =
    +        new FilterQueryScannedResult(blockExecutionInfo, queryStatisticsModel);
    +
    +    // valid scanned blocklet
    +    QueryStatistic validScannedBlockletStatistic = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +        .get(QueryStatisticsConstants.VALID_SCAN_BLOCKLET_NUM);
    +    validScannedBlockletStatistic
    +        .addCountStatistic(QueryStatisticsConstants.VALID_SCAN_BLOCKLET_NUM,
    +            validScannedBlockletStatistic.getCount() + 1);
    +    // adding statistics for valid number of pages
    +    QueryStatistic validPages = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +        .get(QueryStatisticsConstants.VALID_PAGE_SCANNED);
    +    validPages.addCountStatistic(QueryStatisticsConstants.VALID_PAGE_SCANNED,
    +        validPages.getCount() + pages.cardinality());
    +    QueryStatistic scannedPages = queryStatisticsModel.getStatisticsTypeAndObjMap()
    +        .get(QueryStatisticsConstants.PAGE_SCANNED);
    +    scannedPages.addCountStatistic(QueryStatisticsConstants.PAGE_SCANNED,
    +        scannedPages.getCount() + pages.cardinality());
    +    // get the row indexes from bit set for each page
    +    int[] pageFilteredPages = new int[pages.cardinality()];
    +    int index = 0;
    +    for (int i = pages.nextSetBit(0); i >= 0; i = pages.nextSetBit(i + 1)) {
    +      pageFilteredPages[index++] = i;
    +    }
    +    // count(*)  case there would not be any dimensions are measures selected.
    +    int[] numberOfRows = new int[pages.cardinality()];
    +    for (int i = 0; i < numberOfRows.length; i++) {
    +      numberOfRows[i] = rawBlockletColumnChunks.getDataBlock().getPageRowCount(i);
    --- End diff --
   
    ok


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2822: [CARBONDATA-3014] Added support for inverted ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala closed the pull request at:

    https://github.com/apache/carbondata/pull/2822


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2822: [CARBONDATA-3014] Added support for inverted ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
GitHub user ravipesala reopened a pull request:

    https://github.com/apache/carbondata/pull/2822

    [CARBONDATA-3014]  Added support for inverted index and delete delta for direct scan queries

    This PR depends on PR https://github.com/apache/carbondata/pull/2820
   
    Added new classes to support inverted index and delete delta directly from column vector.
    `ColumnarVectorWrapperDirectWithInvertedIndex`
    `ColumnarVectorWrapperDirectWithDeleteDelta`
    `ColumnarVectorWrapperDirectWithDeleteDeltaAndInvertedIndex`
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ravipesala/incubator-carbondata perf-inverted-index

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2822.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2822
   
----
commit 42dfd6adec741e9cf98af92e8a4c3d7810a681e8
Author: ravipesala <ravi.pesala@...>
Date:   2018-10-16T05:02:18Z

    Add carbon property to configure vector based row pruning push down

commit d9ae60c8f7b0b90d6b5a113043c5ec4cd3acf726
Author: ravipesala <ravi.pesala@...>
Date:   2018-10-16T06:00:43Z

    Added support for full scan queries for vector direct fill.

commit ff36f4b55f26732b6a669fcd2edd4e958a04818a
Author: ravipesala <ravi.pesala@...>
Date:   2018-10-21T13:44:11Z

    Fix comments

commit 12878a2591795e53826f615dc54fc3d443227a41
Author: ravipesala <ravi.pesala@...>
Date:   2018-10-16T09:23:14Z

    Added support for pruning pages for vector direct fill.

commit 12bed1a2b875962a90621af6f638a41e7e3f6d4f
Author: ravipesala <ravi.pesala@...>
Date:   2018-10-21T15:27:50Z

    Fix comments

commit 1b08711b3c88539267363735884884499f5586f8
Author: ravipesala <ravi.pesala@...>
Date:   2018-10-16T11:07:18Z

    Added support for inverted index and delete delta for direct scan queries

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2822
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2822
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2822
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2822
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2822
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2822
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2822
 
    retest this please


---
1234