[GitHub] carbondata pull request #2350: [CARBONDATA-2553] support ZSTD compression fo...

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2350: [CARBONDATA-2553] support ZSTD compression fo...

qiuchenjian-2
GitHub user kevinjmh opened a pull request:

    https://github.com/apache/carbondata/pull/2350

    [CARBONDATA-2553] support ZSTD compression for sort temp file

    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kevinjmh/carbondata zstd

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2350.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2350
   
----
commit 0777ffd96a486706ced46cd1270e576644a840fa
Author: Manhua <kevinjmh@...>
Date:   2018-05-29T01:21:52Z

    support ZSTD compression for sort temp file

commit 61d5f0ca2a0411b325624e9ae7c9e8f04461d104
Author: Manhua <kevinjmh@...>
Date:   2018-05-29T01:36:15Z

    update comment and doc

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2350: [CARBONDATA-2553] support ZSTD compression for sort ...

qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2350
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5129/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2350: [CARBONDATA-2553] support ZSTD compression for sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2350
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6145/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2350: [CARBONDATA-2553] support ZSTD compression for sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2350
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4982/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2350: [CARBONDATA-2553] support ZSTD compression for sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/2350
 
    Can you provide description for this PR? And introduce the ZSTD library?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2350: [CARBONDATA-2553] support ZSTD compression fo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kevinjmh closed the pull request at:

    https://github.com/apache/carbondata/pull/2350


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2350: [CARBONDATA-2553] support ZSTD compression fo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
GitHub user kevinjmh reopened a pull request:

    https://github.com/apache/carbondata/pull/2350

    [CARBONDATA-2553] support ZSTD compression for sort temp file

    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   
    ZSTD project Home: http://facebook.github.io/zstd/
   
    Zstandard is a real-time compression algorithm, providing high compression ratios. It can offer higher compression  ratio than snappy with similar speed. In our test, we find that the file size of sort temp file generated by zstd almost half of  snappy.
   
    Hadoop and Spark also imported this project. See:
   
    https://github.com/apache/hadoop/commit/a0a276162147e843a5a4e028abdca5b66f5118da
   
    https://github.com/apache/spark/commit/444bce1c98c45147fe63e2132e9743a0c5e49598
   
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kevinjmh/carbondata zstd

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2350.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2350
   
----
commit 0777ffd96a486706ced46cd1270e576644a840fa
Author: Manhua <kevinjmh@...>
Date:   2018-05-29T01:21:52Z

    support ZSTD compression for sort temp file

commit 61d5f0ca2a0411b325624e9ae7c9e8f04461d104
Author: Manhua <kevinjmh@...>
Date:   2018-05-29T01:36:15Z

    update comment and doc

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2350: [CARBONDATA-2553] support ZSTD compression for sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kevinjmh commented on the issue:

    https://github.com/apache/carbondata/pull/2350
 
    PR description  update


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2350: [CARBONDATA-2553] support ZSTD compression for sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2350
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4984/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2350: [CARBONDATA-2553] support ZSTD compression fo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2350#discussion_r191330085
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java ---
    @@ -1290,11 +1290,11 @@ public String getSortTempCompressor() {
         String compressor = getProperty(CarbonCommonConstants.CARBON_SORT_TEMP_COMPRESSOR,
             CarbonCommonConstants.CARBON_SORT_TEMP_COMPRESSOR_DEFAULT).toUpperCase();
         if (compressor.isEmpty() || "SNAPPY".equals(compressor) || "GZIP".equals(compressor)
    -        || "BZIP2".equals(compressor) || "LZ4".equals(compressor)) {
    +        || "BZIP2".equals(compressor) || "LZ4".equals(compressor) || "ZSTD".equals(compressor)) {
           return compressor;
         } else {
           LOGGER.warn("The ".concat(CarbonCommonConstants.CARBON_SORT_TEMP_COMPRESSOR)
    -          .concat(" configuration value is invalid. Only snappy,gzip,bip2,lz4 and")
    +          .concat(" configuration value is invalid. Only snappy,gzip,bip2,lz4,zstd and")
    --- End diff --
   
    change to `, zstd`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2350: [CARBONDATA-2553] support ZSTD compression fo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2350#discussion_r191330221
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/filesystem/LocalCarbonFile.java ---
    @@ -368,6 +372,8 @@ public DataOutputStream getDataOutputStream(String path, FileFactory.FileType fi
           outputStream = new SnappyOutputStream(new FileOutputStream(path));
         } else if ("LZ4".equalsIgnoreCase(compressor)) {
           outputStream = new LZ4BlockOutputStream(new FileOutputStream(path));
    +    } else if ("ZSTD".equalsIgnoreCase(compressor)) {
    +      outputStream = new ZstdOutputStream(new FileOutputStream(path), 1);
    --- End diff --
   
    what is the meaning of 1?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2350: [CARBONDATA-2553] support ZSTD compression fo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kevinjmh commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2350#discussion_r191341574
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/filesystem/LocalCarbonFile.java ---
    @@ -368,6 +372,8 @@ public DataOutputStream getDataOutputStream(String path, FileFactory.FileType fi
           outputStream = new SnappyOutputStream(new FileOutputStream(path));
         } else if ("LZ4".equalsIgnoreCase(compressor)) {
           outputStream = new LZ4BlockOutputStream(new FileOutputStream(path));
    +    } else if ("ZSTD".equalsIgnoreCase(compressor)) {
    +      outputStream = new ZstdOutputStream(new FileOutputStream(path), 1);
    --- End diff --
   
    compress level 1.  We have test using default level 3 which can get better compress ratio but cost more time. As for sort temp file which is not used for storage, level 1 is cost-effective.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2350: [CARBONDATA-2553] support ZSTD compression for sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2350
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6155/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2350: [CARBONDATA-2553] support ZSTD compression for sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2350
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4994/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2350: [CARBONDATA-2553] support ZSTD compression for sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2350
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5137/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2350: [CARBONDATA-2553] support ZSTD compression for sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/2350
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2350: [CARBONDATA-2553] support ZSTD compression for sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2350
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6181/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2350: [CARBONDATA-2553] support ZSTD compression for sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2350
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5020/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2350: [CARBONDATA-2553] support ZSTD compression for sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/2350
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2350: [CARBONDATA-2553] support ZSTD compression for sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2350
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6323/



---
12