GitHub user dhatchayani opened a pull request:
https://github.com/apache/carbondata/pull/2975 [WIP][CARBONDATA-3145] Read improvement for complex column pages while querying **Problem:** Column page is decoded for getting each row of a complex primitive column. **Solution:** Decode a page it once then use the same. - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done Existing UT - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dhatchayani/carbondata CARBONDATA-3145 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2975.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2975 ---- commit 2f1f8b0081f3229ef151f8171b19fa76006065fb Author: dhatchayani <dhatcha.official@...> Date: 2018-12-05T07:10:56Z [CARBONDATA-3145] Read improvement for complex column pages while querying ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2975 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1642/ --- |
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2975#discussion_r238971934 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java --- @@ -29,6 +32,8 @@ protected int blockIndex; + protected Map<Integer, DimensionColumnPage> decodedPagesWithPageNumber = new HashMap<>(); --- End diff -- I think this Map will be better to cache the 'byte[] data', because getChunkData also takes time --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2975 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9902/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2975 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1853/ --- |
In reply to this post by qiuchenjian-2
Github user dhatchayani commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2975#discussion_r238988286 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java --- @@ -29,6 +32,8 @@ protected int blockIndex; + protected Map<Integer, DimensionColumnPage> decodedPagesWithPageNumber = new HashMap<>(); --- End diff -- This map is to hold to DimensionColumnPage as we are decoding the same page again and again. 'byte[] data' for each row is the query result, we need to get that from the decodedPage --- |
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2975#discussion_r238989259 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java --- @@ -29,6 +32,8 @@ protected int blockIndex; + protected Map<Integer, DimensionColumnPage> decodedPagesWithPageNumber = new HashMap<>(); --- End diff -- OK, I didn't mention the pageNumber and rowNumber > This map is to hold to DimensionColumnPage as we are decoding the same page again and again. 'byte[] data' for each row is the query result, we need to get that from the decodedPage OK, I didn't mention the pageNumber and rowNumber --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2975 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1646/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2975 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9906/ --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:
https://github.com/apache/carbondata/pull/2975 @dhatchayani can u please update the performance result --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2975 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1857/ --- |
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on the issue:
https://github.com/apache/carbondata/pull/2975 Are the decoded DimensionColumnPage are cached in DimensionRawColumnChunk, so the function of cache is used by other code. Such as, DimensionRawColumnChunk contains a Map<Integer,DimensionColumnPage> cacheDimPages, when a Page is decoded, it will cache in this map. --- |
In reply to this post by qiuchenjian-2
Github user dhatchayani commented on the issue:
https://github.com/apache/carbondata/pull/2975 @qiuchenjian Actually, DimensionColumnPage is decoded from DimensionRawColumnChunk. So it is meaningless to cache it in DimensionRawColumnChunk. Both serves different purposes. --- |
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on the issue:
https://github.com/apache/carbondata/pull/2975 @dhatchayani I read the DimensionRawColumnChunk, this class has cache the decoded DimensionColumnPage, is your Map added useful? public DimensionColumnPage decodeColumnPage(int pageNumber) { assert pageNumber < pagesCount; if (dataChunks == null) { dataChunks = new DimensionColumnPage[pagesCount]; } if (dataChunks[pageNumber] == null) { try { dataChunks[pageNumber] = chunkReader.decodeColumnPage(this, pageNumber, null); } catch (IOException | MemoryException e) { throw new RuntimeException(e); } } return dataChunks[pageNumber]; } --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2975 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1671/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2975 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9931/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2975 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1883/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2975 LGTM --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:
https://github.com/apache/carbondata/pull/2975 LGTM.. @ravipesala yeah u are right...Based on PR desc I asked for performance report, now it's okay :) --- |
In reply to this post by qiuchenjian-2
|
Free forum by Nabble | Edit this page |