[GitHub] carbondata pull request #2689: [CARBONDATA-2851][CARBONDATA-2852] Support zs...

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2689: [CARBONDATA-2851][CARBONDATA-2852] Support zs...

qiuchenjian-2
GitHub user xuchuanyin opened a pull request:

    https://github.com/apache/carbondata/pull/2689

    [CARBONDATA-2851][CARBONDATA-2852] Support zstd as column compressor in final store

    1. add zstd compressor for compressing column data
    2. add zstd support in thrift
    3. since zstd does not support zero-copy while compressing, offheap will not take effect for zstd
    4. Column compressor is configured through system property and can be changed in each load. Before loading, Carbondata will get the compressor and use that compressor during that loading. During querying, carbondata will get the compressor information from metadata in the file data.
    5. Also support compressing streaming table using zstd. The compressor info is stored in FileHeader of the streaming file.
    6. This PR also considered and verified on the legacy store and compaction
   
    A simple test with 1.2GB raw CSV data shows that the size (in MB) of final store with different compressor:
   
    | local dictionary | snappy | zstd | Size Reduced |
    | --- | --- | --- | -- |
    | enabled | 335 | 207 | 38.2% |
    | disabled | 375 | 225 | 40% |
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [x] Any interfaces changed?
     `Yes, only internal used interfaces are changed`
     - [x] Any backward compatibility impacted?
     `Yes, backward compatibility is handled`
     - [x] Document update required?
    `Yes`
     - [x] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
    `Added tests`
            - How it is tested? Please attach test report.
    `Tested in local machine`
            - Is it a performance related change? Please attach the performance test report.
    `The size of final store has been decreased by 40% compared with default snappy`
            - Any additional information to help reviewers in testing this change.
    `NA`
           
     - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    `NA`


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuchuanyin/carbondata 0813_read_compressor_from_datafiles

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2689.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2689
   
----
commit d9539ccb832a73b2b6438d1c35fee1cc7aff8f5e
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-08-10T14:02:57Z

    Support zstd as column compressor in final store
   
    1. add zstd compressor for compressing column data
    2. add zstd support in thrift
    3. legacy store is not considered in this commit
    4. since zstd does not support zero-copy while compressing, offheap will
    not take effect for zstd
    5. support lazy load for compressor

commit 3ef78a97c75f1514a84c6ae7b694c893eaef1eb7
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-08-13T13:45:42Z

    Support new compressor on legacy store
   
    In query procedure, we need to decompress the column page. Previously we
    get the compressor from system property. Now since we support new
    compressors, we should read the compressor information from the metadata
    in datafiles.
    This PR also solve the compatibility related problems on V1/V2 store where we
    only support snappy.

commit b5bafcb14aae0650c30d41537ea295c220693a04
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-08-14T08:38:00Z

    fix comments

commit 2a270952a1fec1552bde78c30ec64154ccdd6327
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-08-23T09:35:23Z

    Determine the column compressor before data loading
   
    we will get the column compressor before data loading/compaction start,
    so that it can make all the pages use the same compressor in case of
    concurrent modifying compressor during loading.

commit 6a9c0b914de9c4b26405d2979b555dc0165f27b7
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-08-27T11:18:30Z

    set compressor in carbon load model
   
    column compressor is necessary for carbon load model, otherwise load
    will fail.

commit 0934551c09f3d563d27a1bd91c70e8c62ea60527
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-08-30T04:02:33Z

    fix error in test

commit 10ccff8d4a6c3d14aa5feef7251e1b155dc4d0c5
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-09-03T03:58:02Z

    fix review comments
   
    optimize parameters for column page, use columnPageEncodeMeta instead of
    its members

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2689: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2689
 
    This PR is a replacement for PR #2628, the CI for original PR has problems.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2689: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2689
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8282/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2689: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2689
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/211/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2689: WIP:[CARBONDATA-2851][CARBONDATA-2852] Support zstd ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2689
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2689: WIP:[CARBONDATA-2851][CARBONDATA-2852] Support zstd ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2689
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/231/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2689: WIP:[CARBONDATA-2851][CARBONDATA-2852] Support zstd ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2689
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8301/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2689: WIP: test

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2689
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8323/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2689: WIP: test

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2689
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/253/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2689: WIP: test

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2689
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/5/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2689: WIP: test

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2689
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/103/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2689: WIP: test

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2689
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8341/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2689: WIP: test

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2689
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/271/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2689: WIP: test

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin closed the pull request at:

    https://github.com/apache/carbondata/pull/2689


---