GitHub user ravipesala opened a pull request:
https://github.com/apache/carbondata/pull/2819 [CARBONDATA-3012] Added support for full scan queries for vector direct fill. After decompressing the page in our V3 reader we can immediately fill the data to a vector without any condition checks inside loops. So here complete column page data is set to column vector in a single batch and gives back data to Spark/Presto. For this purpose, a new method is added in `ColumnPageDecoder` ``` ColumnPage decodeAndFillVector(byte[] input, int offset, int length, ColumnVectorInfo vectorInfo, BitSet nullBits, boolean isLVEncoded) ``` The above method takes vector fill it in a single loop without any checks inside loop. And also added new method inside `DimensionDataChunkStore` ``` void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] data, ColumnVectorInfo vectorInfo); ``` The above method takes vector fill it in a single loop without any checks inside loop. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata perf-full-scan Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2819.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2819 ---- commit 358299f90df98272723f22f43ab025bd1e7fa3e8 Author: ravipesala <ravi.pesala@...> Date: 2018-10-16T05:02:18Z Add carbon property to configure vector based row pruning push down commit 658d8cb02b657e9b5887c0348971b9d92087fab2 Author: ravipesala <ravi.pesala@...> Date: 2018-10-16T06:00:43Z Added support for full scan queries for vector direct fill. ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2819 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/995/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2819 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9063/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2819 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/798/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2819 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1001/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2819 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/802/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2819 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9069/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2819 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/804/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2819 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/805/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2819 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/807/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2819 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1004/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2819 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9072/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2819 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/813/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2819 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1010/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2819 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/816/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2819 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9081/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2819 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1013/ --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2819#discussion_r225792779 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/ColumnPage.java --- @@ -633,6 +622,56 @@ public boolean getBoolean(int rowId) { */ public abstract double getDouble(int rowId); + + + + + /** + * Get byte value at rowId + */ + public abstract byte[] getByteData(); --- End diff -- Instead of abstract method better add method with default implemnetion in this class, class where we want to provide the proper implementation we can override this method --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2819#discussion_r225794608 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java --- @@ -221,49 +229,66 @@ protected DimensionRawColumnChunk getDimensionRawColumnChunk(FileReader fileRead int offset = (int) rawColumnPage.getOffSet() + dimensionChunksLength .get(rawColumnPage.getColumnIndex()) + dataChunk3.getPage_offset().get(pageNumber); // first read the data and uncompressed it - return decodeDimension(rawColumnPage, rawData, pageMetadata, offset); + return decodeDimension(rawColumnPage, rawData, pageMetadata, offset, vectorInfo); + } + + @Override + public void decodeColumnPageAndFillVector(DimensionRawColumnChunk dimensionRawColumnChunk, + int pageNumber, ColumnVectorInfo vectorInfo) throws IOException, MemoryException { + DimensionColumnPage columnPage = + decodeColumnPage(dimensionRawColumnChunk, pageNumber, vectorInfo); + columnPage.freeMemory(); } - private ColumnPage decodeDimensionByMeta(DataChunk2 pageMetadata, - ByteBuffer pageData, int offset, boolean isLocalDictEncodedPage) + private ColumnPage decodeDimensionByMeta(DataChunk2 pageMetadata, ByteBuffer pageData, int offset, + boolean isLocalDictEncodedPage, ColumnVectorInfo vectorInfo, BitSet nullBitSet) throws IOException, MemoryException { List<Encoding> encodings = pageMetadata.getEncoders(); List<ByteBuffer> encoderMetas = pageMetadata.getEncoder_meta(); String compressorName = CarbonMetadataUtil.getCompressorNameFromChunkMeta( pageMetadata.getChunk_meta()); ColumnPageDecoder decoder = encodingFactory.createDecoder(encodings, encoderMetas, - compressorName); - return decoder - .decode(pageData.array(), offset, pageMetadata.data_page_length, isLocalDictEncodedPage); + compressorName, vectorInfo != null); + if (vectorInfo != null) { + return decoder + .decodeAndFillVector(pageData.array(), offset, pageMetadata.data_page_length, vectorInfo, + nullBitSet, isLocalDictEncodedPage); + } else { + return decoder + .decode(pageData.array(), offset, pageMetadata.data_page_length, isLocalDictEncodedPage); + } } protected DimensionColumnPage decodeDimension(DimensionRawColumnChunk rawColumnPage, - ByteBuffer pageData, DataChunk2 pageMetadata, int offset) + ByteBuffer pageData, DataChunk2 pageMetadata, int offset, ColumnVectorInfo vectorInfo) throws IOException, MemoryException { List<Encoding> encodings = pageMetadata.getEncoders(); if (CarbonUtil.isEncodedWithMeta(encodings)) { - ColumnPage decodedPage = decodeDimensionByMeta(pageMetadata, pageData, offset, - null != rawColumnPage.getLocalDictionary()); - decodedPage.setNullBits(QueryUtil.getNullBitSet(pageMetadata.presence, this.compressor)); int[] invertedIndexes = new int[0]; int[] invertedIndexesReverse = new int[0]; // in case of no dictionary measure data types, if it is included in sort columns // then inverted index to be uncompressed + boolean isExplicitSorted = + CarbonUtil.hasEncoding(pageMetadata.encoders, Encoding.INVERTED_INDEX); + int dataOffset = offset; if (encodings.contains(Encoding.INVERTED_INDEX)) { offset += pageMetadata.data_page_length; - if (CarbonUtil.hasEncoding(pageMetadata.encoders, Encoding.INVERTED_INDEX)) { + if (isExplicitSorted) { --- End diff -- This If check is not required as above If check is already checking whether it is explicit sorted or not --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2819#discussion_r225794798 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java --- @@ -221,49 +229,66 @@ protected DimensionRawColumnChunk getDimensionRawColumnChunk(FileReader fileRead int offset = (int) rawColumnPage.getOffSet() + dimensionChunksLength .get(rawColumnPage.getColumnIndex()) + dataChunk3.getPage_offset().get(pageNumber); // first read the data and uncompressed it - return decodeDimension(rawColumnPage, rawData, pageMetadata, offset); + return decodeDimension(rawColumnPage, rawData, pageMetadata, offset, vectorInfo); + } + + @Override + public void decodeColumnPageAndFillVector(DimensionRawColumnChunk dimensionRawColumnChunk, + int pageNumber, ColumnVectorInfo vectorInfo) throws IOException, MemoryException { + DimensionColumnPage columnPage = + decodeColumnPage(dimensionRawColumnChunk, pageNumber, vectorInfo); + columnPage.freeMemory(); } - private ColumnPage decodeDimensionByMeta(DataChunk2 pageMetadata, - ByteBuffer pageData, int offset, boolean isLocalDictEncodedPage) + private ColumnPage decodeDimensionByMeta(DataChunk2 pageMetadata, ByteBuffer pageData, int offset, + boolean isLocalDictEncodedPage, ColumnVectorInfo vectorInfo, BitSet nullBitSet) throws IOException, MemoryException { List<Encoding> encodings = pageMetadata.getEncoders(); List<ByteBuffer> encoderMetas = pageMetadata.getEncoder_meta(); String compressorName = CarbonMetadataUtil.getCompressorNameFromChunkMeta( pageMetadata.getChunk_meta()); ColumnPageDecoder decoder = encodingFactory.createDecoder(encodings, encoderMetas, - compressorName); - return decoder - .decode(pageData.array(), offset, pageMetadata.data_page_length, isLocalDictEncodedPage); + compressorName, vectorInfo != null); + if (vectorInfo != null) { + return decoder + .decodeAndFillVector(pageData.array(), offset, pageMetadata.data_page_length, vectorInfo, + nullBitSet, isLocalDictEncodedPage); + } else { + return decoder + .decode(pageData.array(), offset, pageMetadata.data_page_length, isLocalDictEncodedPage); + } } protected DimensionColumnPage decodeDimension(DimensionRawColumnChunk rawColumnPage, - ByteBuffer pageData, DataChunk2 pageMetadata, int offset) + ByteBuffer pageData, DataChunk2 pageMetadata, int offset, ColumnVectorInfo vectorInfo) throws IOException, MemoryException { List<Encoding> encodings = pageMetadata.getEncoders(); if (CarbonUtil.isEncodedWithMeta(encodings)) { - ColumnPage decodedPage = decodeDimensionByMeta(pageMetadata, pageData, offset, - null != rawColumnPage.getLocalDictionary()); - decodedPage.setNullBits(QueryUtil.getNullBitSet(pageMetadata.presence, this.compressor)); int[] invertedIndexes = new int[0]; int[] invertedIndexesReverse = new int[0]; // in case of no dictionary measure data types, if it is included in sort columns // then inverted index to be uncompressed + boolean isExplicitSorted = + CarbonUtil.hasEncoding(pageMetadata.encoders, Encoding.INVERTED_INDEX); + int dataOffset = offset; if (encodings.contains(Encoding.INVERTED_INDEX)) { --- End diff -- use isExplicitSorted as inverted index is present in encoding is already taken in this variable --- |
Free forum by Nabble | Edit this page |