[GitHub] carbondata pull request #2975: [WIP][CARBONDATA-3145] Read improvement for c...

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2975: [WIP][CARBONDATA-3145] Read improvement for c...

qiuchenjian-2
GitHub user dhatchayani opened a pull request:

    https://github.com/apache/carbondata/pull/2975

    [WIP][CARBONDATA-3145] Read improvement for complex column pages while querying

    **Problem:**
    Column page is decoded for getting each row of a complex primitive column.
   
    **Solution:**
    Decode a page it once then use the same.
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [x] Testing done
           Existing UT
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dhatchayani/carbondata CARBONDATA-3145

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2975.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2975
   
----
commit 2f1f8b0081f3229ef151f8171b19fa76006065fb
Author: dhatchayani <dhatcha.official@...>
Date:   2018-12-05T07:10:56Z

    [CARBONDATA-3145] Read improvement for complex column pages while querying

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2975
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1642/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2975: [WIP][CARBONDATA-3145] Read improvement for c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2975#discussion_r238971934
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java ---
    @@ -29,6 +32,8 @@
     
       protected int blockIndex;
     
    +  protected Map<Integer, DimensionColumnPage> decodedPagesWithPageNumber = new HashMap<>();
    --- End diff --
   
    I think this Map will be better to cache the 'byte[] data',   because getChunkData also takes time


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2975
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9902/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2975
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1853/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2975: [WIP][CARBONDATA-3145] Read improvement for c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user dhatchayani commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2975#discussion_r238988286
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java ---
    @@ -29,6 +32,8 @@
     
       protected int blockIndex;
     
    +  protected Map<Integer, DimensionColumnPage> decodedPagesWithPageNumber = new HashMap<>();
    --- End diff --
   
    This map is to hold to DimensionColumnPage as we are decoding the same page again and again. 'byte[] data' for each row is the query result, we need to get that from the decodedPage


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2975: [WIP][CARBONDATA-3145] Read improvement for c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2975#discussion_r238989259
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java ---
    @@ -29,6 +32,8 @@
     
       protected int blockIndex;
     
    +  protected Map<Integer, DimensionColumnPage> decodedPagesWithPageNumber = new HashMap<>();
    --- End diff --
   
    OK, I didn't mention the pageNumber and rowNumber
   
    > This map is to hold to DimensionColumnPage as we are decoding the same page again and again. 'byte[] data' for each row is the query result, we need to get that from the decodedPage
   
    OK, I didn't mention the pageNumber and rowNumber


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2975
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1646/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2975
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9906/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2975
 
    @dhatchayani can u please update the performance result


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2975
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1857/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on the issue:

    https://github.com/apache/carbondata/pull/2975
 
    Are the decoded DimensionColumnPage are cached in DimensionRawColumnChunk, so the function of cache is used by other code.
    Such as, DimensionRawColumnChunk contains a Map<Integer,DimensionColumnPage> cacheDimPages, when a Page is decoded, it will cache in this map.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user dhatchayani commented on the issue:

    https://github.com/apache/carbondata/pull/2975
 
    @qiuchenjian Actually, DimensionColumnPage is decoded from DimensionRawColumnChunk. So it is meaningless to cache it in DimensionRawColumnChunk. Both serves different purposes.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on the issue:

    https://github.com/apache/carbondata/pull/2975
 
    @dhatchayani  I read the DimensionRawColumnChunk, this class has cache the decoded  DimensionColumnPage,   is your Map added useful?
   
      public DimensionColumnPage decodeColumnPage(int pageNumber) {
        assert pageNumber < pagesCount;
        if (dataChunks == null) {
          dataChunks = new DimensionColumnPage[pagesCount];
        }
        if (dataChunks[pageNumber] == null) {
          try {
            dataChunks[pageNumber] = chunkReader.decodeColumnPage(this, pageNumber, null);
          } catch (IOException | MemoryException e) {
            throw new RuntimeException(e);
          }
        }
   
        return dataChunks[pageNumber];
      }



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2975
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1671/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2975
 
    Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9931/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2975
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1883/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2975: [CARBONDATA-3145] Read improvement for complex colum...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2975
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2975: [CARBONDATA-3145] Avoid duplicate decoding for compl...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2975
 
    LGTM.. @ravipesala yeah u are right...Based on PR desc I asked for performance report, now it's okay :)


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2975: [CARBONDATA-3145] Avoid duplicate decoding fo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2975


---
12