[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...

classic Classic list List threaded Threaded
126 messages Options
1234 ... 7
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...

qiuchenjian-2
GitHub user ravipesala opened a pull request:

    https://github.com/apache/carbondata/pull/2819

    [CARBONDATA-3012] Added support for full scan queries for vector direct fill.

    After decompressing the page in our V3 reader we can immediately fill the data to a vector without any condition checks inside loops. So here complete column page data is set to column vector in a single batch and gives back data to Spark/Presto.
    For this purpose, a new method is added in `ColumnPageDecoder`
    ```
    ColumnPage decodeAndFillVector(byte[] input, int offset, int length, ColumnVectorInfo vectorInfo,
          BitSet nullBits, boolean isLVEncoded)
    ```
    The above method takes vector fill it in a single loop without any checks inside loop.
   
    And also added new method inside `DimensionDataChunkStore`
   
    ```
     void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] data,
          ColumnVectorInfo vectorInfo);
    ```
    The above method takes vector fill it in a single loop without any checks inside loop.
   
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ravipesala/incubator-carbondata perf-full-scan

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2819.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2819
   
----
commit 358299f90df98272723f22f43ab025bd1e7fa3e8
Author: ravipesala <ravi.pesala@...>
Date:   2018-10-16T05:02:18Z

    Add carbon property to configure vector based row pruning push down

commit 658d8cb02b657e9b5887c0348971b9d92087fab2
Author: ravipesala <ravi.pesala@...>
Date:   2018-10-16T06:00:43Z

    Added support for full scan queries for vector direct fill.

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2819: [CARBONDATA-3012] Added support for full scan querie...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2819
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/995/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2819: [CARBONDATA-3012] Added support for full scan querie...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2819
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9063/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2819: [CARBONDATA-3012] Added support for full scan querie...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2819
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/798/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2819: [CARBONDATA-3012] Added support for full scan querie...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2819
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1001/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2819: [CARBONDATA-3012] Added support for full scan querie...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2819
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/802/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2819: [CARBONDATA-3012] Added support for full scan querie...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2819
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9069/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2819: [CARBONDATA-3012] Added support for full scan querie...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2819
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/804/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2819: [CARBONDATA-3012] Added support for full scan querie...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2819
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/805/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2819: [CARBONDATA-3012] Added support for full scan querie...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2819
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/807/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2819: [CARBONDATA-3012] Added support for full scan querie...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2819
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1004/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2819: [CARBONDATA-3012] Added support for full scan querie...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2819
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9072/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2819: [CARBONDATA-3012] Added support for full scan querie...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2819
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/813/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2819: [CARBONDATA-3012] Added support for full scan querie...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2819
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1010/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2819: [CARBONDATA-3012] Added support for full scan querie...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2819
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/816/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2819: [CARBONDATA-3012] Added support for full scan querie...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2819
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9081/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2819: [CARBONDATA-3012] Added support for full scan querie...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2819
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1013/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2819#discussion_r225792779
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/ColumnPage.java ---
    @@ -633,6 +622,56 @@ public boolean getBoolean(int rowId) {
        */
       public abstract double getDouble(int rowId);
     
    +
    +
    +
    +
    +  /**
    +   * Get byte value at rowId
    +   */
    +  public abstract byte[] getByteData();
    --- End diff --
   
    Instead of abstract method better add method with default implemnetion in this class, class where we want to provide  the proper implementation we can override this method


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2819#discussion_r225794608
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java ---
    @@ -221,49 +229,66 @@ protected DimensionRawColumnChunk getDimensionRawColumnChunk(FileReader fileRead
         int offset = (int) rawColumnPage.getOffSet() + dimensionChunksLength
             .get(rawColumnPage.getColumnIndex()) + dataChunk3.getPage_offset().get(pageNumber);
         // first read the data and uncompressed it
    -    return decodeDimension(rawColumnPage, rawData, pageMetadata, offset);
    +    return decodeDimension(rawColumnPage, rawData, pageMetadata, offset, vectorInfo);
    +  }
    +
    +  @Override
    +  public void decodeColumnPageAndFillVector(DimensionRawColumnChunk dimensionRawColumnChunk,
    +      int pageNumber, ColumnVectorInfo vectorInfo) throws IOException, MemoryException {
    +    DimensionColumnPage columnPage =
    +        decodeColumnPage(dimensionRawColumnChunk, pageNumber, vectorInfo);
    +    columnPage.freeMemory();
       }
     
    -  private ColumnPage decodeDimensionByMeta(DataChunk2 pageMetadata,
    -      ByteBuffer pageData, int offset, boolean isLocalDictEncodedPage)
    +  private ColumnPage decodeDimensionByMeta(DataChunk2 pageMetadata, ByteBuffer pageData, int offset,
    +      boolean isLocalDictEncodedPage, ColumnVectorInfo vectorInfo, BitSet nullBitSet)
           throws IOException, MemoryException {
         List<Encoding> encodings = pageMetadata.getEncoders();
         List<ByteBuffer> encoderMetas = pageMetadata.getEncoder_meta();
         String compressorName = CarbonMetadataUtil.getCompressorNameFromChunkMeta(
             pageMetadata.getChunk_meta());
         ColumnPageDecoder decoder = encodingFactory.createDecoder(encodings, encoderMetas,
    -        compressorName);
    -    return decoder
    -        .decode(pageData.array(), offset, pageMetadata.data_page_length, isLocalDictEncodedPage);
    +        compressorName, vectorInfo != null);
    +    if (vectorInfo != null) {
    +      return decoder
    +          .decodeAndFillVector(pageData.array(), offset, pageMetadata.data_page_length, vectorInfo,
    +              nullBitSet, isLocalDictEncodedPage);
    +    } else {
    +      return decoder
    +          .decode(pageData.array(), offset, pageMetadata.data_page_length, isLocalDictEncodedPage);
    +    }
       }
     
       protected DimensionColumnPage decodeDimension(DimensionRawColumnChunk rawColumnPage,
    -      ByteBuffer pageData, DataChunk2 pageMetadata, int offset)
    +      ByteBuffer pageData, DataChunk2 pageMetadata, int offset, ColumnVectorInfo vectorInfo)
           throws IOException, MemoryException {
         List<Encoding> encodings = pageMetadata.getEncoders();
         if (CarbonUtil.isEncodedWithMeta(encodings)) {
    -      ColumnPage decodedPage = decodeDimensionByMeta(pageMetadata, pageData, offset,
    -          null != rawColumnPage.getLocalDictionary());
    -      decodedPage.setNullBits(QueryUtil.getNullBitSet(pageMetadata.presence, this.compressor));
           int[] invertedIndexes = new int[0];
           int[] invertedIndexesReverse = new int[0];
           // in case of no dictionary measure data types, if it is included in sort columns
           // then inverted index to be uncompressed
    +      boolean isExplicitSorted =
    +          CarbonUtil.hasEncoding(pageMetadata.encoders, Encoding.INVERTED_INDEX);
    +      int dataOffset = offset;
           if (encodings.contains(Encoding.INVERTED_INDEX)) {
             offset += pageMetadata.data_page_length;
    -        if (CarbonUtil.hasEncoding(pageMetadata.encoders, Encoding.INVERTED_INDEX)) {
    +        if (isExplicitSorted) {
    --- End diff --
   
    This If check is not required as above If check is already checking whether it is explicit sorted or not


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2819#discussion_r225794798
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java ---
    @@ -221,49 +229,66 @@ protected DimensionRawColumnChunk getDimensionRawColumnChunk(FileReader fileRead
         int offset = (int) rawColumnPage.getOffSet() + dimensionChunksLength
             .get(rawColumnPage.getColumnIndex()) + dataChunk3.getPage_offset().get(pageNumber);
         // first read the data and uncompressed it
    -    return decodeDimension(rawColumnPage, rawData, pageMetadata, offset);
    +    return decodeDimension(rawColumnPage, rawData, pageMetadata, offset, vectorInfo);
    +  }
    +
    +  @Override
    +  public void decodeColumnPageAndFillVector(DimensionRawColumnChunk dimensionRawColumnChunk,
    +      int pageNumber, ColumnVectorInfo vectorInfo) throws IOException, MemoryException {
    +    DimensionColumnPage columnPage =
    +        decodeColumnPage(dimensionRawColumnChunk, pageNumber, vectorInfo);
    +    columnPage.freeMemory();
       }
     
    -  private ColumnPage decodeDimensionByMeta(DataChunk2 pageMetadata,
    -      ByteBuffer pageData, int offset, boolean isLocalDictEncodedPage)
    +  private ColumnPage decodeDimensionByMeta(DataChunk2 pageMetadata, ByteBuffer pageData, int offset,
    +      boolean isLocalDictEncodedPage, ColumnVectorInfo vectorInfo, BitSet nullBitSet)
           throws IOException, MemoryException {
         List<Encoding> encodings = pageMetadata.getEncoders();
         List<ByteBuffer> encoderMetas = pageMetadata.getEncoder_meta();
         String compressorName = CarbonMetadataUtil.getCompressorNameFromChunkMeta(
             pageMetadata.getChunk_meta());
         ColumnPageDecoder decoder = encodingFactory.createDecoder(encodings, encoderMetas,
    -        compressorName);
    -    return decoder
    -        .decode(pageData.array(), offset, pageMetadata.data_page_length, isLocalDictEncodedPage);
    +        compressorName, vectorInfo != null);
    +    if (vectorInfo != null) {
    +      return decoder
    +          .decodeAndFillVector(pageData.array(), offset, pageMetadata.data_page_length, vectorInfo,
    +              nullBitSet, isLocalDictEncodedPage);
    +    } else {
    +      return decoder
    +          .decode(pageData.array(), offset, pageMetadata.data_page_length, isLocalDictEncodedPage);
    +    }
       }
     
       protected DimensionColumnPage decodeDimension(DimensionRawColumnChunk rawColumnPage,
    -      ByteBuffer pageData, DataChunk2 pageMetadata, int offset)
    +      ByteBuffer pageData, DataChunk2 pageMetadata, int offset, ColumnVectorInfo vectorInfo)
           throws IOException, MemoryException {
         List<Encoding> encodings = pageMetadata.getEncoders();
         if (CarbonUtil.isEncodedWithMeta(encodings)) {
    -      ColumnPage decodedPage = decodeDimensionByMeta(pageMetadata, pageData, offset,
    -          null != rawColumnPage.getLocalDictionary());
    -      decodedPage.setNullBits(QueryUtil.getNullBitSet(pageMetadata.presence, this.compressor));
           int[] invertedIndexes = new int[0];
           int[] invertedIndexesReverse = new int[0];
           // in case of no dictionary measure data types, if it is included in sort columns
           // then inverted index to be uncompressed
    +      boolean isExplicitSorted =
    +          CarbonUtil.hasEncoding(pageMetadata.encoders, Encoding.INVERTED_INDEX);
    +      int dataOffset = offset;
           if (encodings.contains(Encoding.INVERTED_INDEX)) {
    --- End diff --
   
    use isExplicitSorted as inverted index is present in encoding is already taken in this variable


---
1234 ... 7