Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2628 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8251/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2628 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/180/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2628 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2628 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8265/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2628 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/194/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin closed the pull request at:
https://github.com/apache/carbondata/pull/2628 --- |
In reply to this post by qiuchenjian-2
GitHub user xuchuanyin reopened a pull request:
https://github.com/apache/carbondata/pull/2628 [CARBONDATA-2851][CARBONDATA-2852] Support zstd as column compressor in final store 1. add zstd compressor for compressing column data 2. add zstd support in thrift 3. since zstd does not support zero-copy while compressing, offheap will not take effect for zstd 4. Column compressor is configured through system property and can be changed in each load. Before loading, Carbondata will get the compressor and use that compressor during that loading. During querying, carbondata will get the compressor information from metadata in the file data. 5. Also support compressing streaming table using zstd. The compressor info is stored in FileHeader of the streaming file. 6. This PR also considered and verified on the legacy store and compaction A simple test with 1.2GB raw CSV data shows that the size (in MB) of final store with different compressor: | local dictionary | snappy | zstd | Size Reduced | | --- | --- | --- | -- | | enabled | 335 | 207 | 38.2% | | disabled | 375 | 225 | 40% | Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? `Yes, only internal used interfaces are changed` - [x] Any backward compatibility impacted? `Yes, backward compatibility is handled` - [x] Document update required? `Yes` - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? `Added tests` - How it is tested? Please attach test report. `Tested in local machine` - Is it a performance related change? Please attach the performance test report. `The size of final store has been decreased by 40% compared with default snappy` - Any additional information to help reviewers in testing this change. `NA` - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. `NA` You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata 0810_support_zstd_compressor_final_store Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2628.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2628 ---- commit c171ee13136785110f6fff8104afebc4b2f222c7 Author: xuchuanyin <xuchuanyin@...> Date: 2018-08-10T14:02:57Z Support zstd as column compressor in final store 1. add zstd compressor for compressing column data 2. add zstd support in thrift 3. legacy store is not considered in this commit 4. since zstd does not support zero-copy while compressing, offheap will not take effect for zstd 5. support lazy load for compressor commit 6448e6f21da66172775b625730b922fdfa57822d Author: xuchuanyin <xuchuanyin@...> Date: 2018-08-13T13:45:42Z Support new compressor on legacy store In query procedure, we need to decompress the column page. Previously we get the compressor from system property. Now since we support new compressors, we should read the compressor information from the metadata in datafiles. This PR also solve the compatibility related problems on V1/V2 store where we only support snappy. commit 2815c84f1d5fd99ff37ba6890d98fb2b73a95b00 Author: xuchuanyin <xuchuanyin@...> Date: 2018-08-14T08:38:00Z fix comments commit ac95c25fca1c37f10f9cce0db76062207d0d3cee Author: xuchuanyin <xuchuanyin@...> Date: 2018-08-23T09:35:23Z Determine the column compressor before data loading we will get the column compressor before data loading/compaction start, so that it can make all the pages use the same compressor in case of concurrent modifying compressor during loading. commit a672d3baad1c476308c0aec5133e418afeaeacb2 Author: xuchuanyin <xuchuanyin@...> Date: 2018-08-27T11:18:30Z set compressor in carbon load model column compressor is necessary for carbon load model, otherwise load will fail. commit d05c1cc38e1fa42ef94f70577ee2a715f649ebe3 Author: xuchuanyin <xuchuanyin@...> Date: 2018-08-30T04:02:33Z fix error in test commit fb8cdfb1258b477f7d8c867ee74bd59386725d9c Author: xuchuanyin <xuchuanyin@...> Date: 2018-09-03T03:58:02Z fix review comments optimize parameters for column page, use columnPageEncodeMeta instead of its members ---- --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin closed the pull request at:
https://github.com/apache/carbondata/pull/2628 --- |
In reply to this post by qiuchenjian-2
GitHub user xuchuanyin reopened a pull request:
https://github.com/apache/carbondata/pull/2628 [CARBONDATA-2851][CARBONDATA-2852] Support zstd as column compressor in final store 1. add zstd compressor for compressing column data 2. add zstd support in thrift 3. since zstd does not support zero-copy while compressing, offheap will not take effect for zstd 4. Column compressor is configured through system property and can be changed in each load. Before loading, Carbondata will get the compressor and use that compressor during that loading. During querying, carbondata will get the compressor information from metadata in the file data. 5. Also support compressing streaming table using zstd. The compressor info is stored in FileHeader of the streaming file. 6. This PR also considered and verified on the legacy store and compaction A simple test with 1.2GB raw CSV data shows that the size (in MB) of final store with different compressor: | local dictionary | snappy | zstd | Size Reduced | | --- | --- | --- | -- | | enabled | 335 | 207 | 38.2% | | disabled | 375 | 225 | 40% | Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? `Yes, only internal used interfaces are changed` - [x] Any backward compatibility impacted? `Yes, backward compatibility is handled` - [x] Document update required? `Yes` - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? `Added tests` - How it is tested? Please attach test report. `Tested in local machine` - Is it a performance related change? Please attach the performance test report. `The size of final store has been decreased by 40% compared with default snappy` - Any additional information to help reviewers in testing this change. `NA` - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. `NA` You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata 0810_support_zstd_compressor_final_store Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2628.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2628 ---- commit c171ee13136785110f6fff8104afebc4b2f222c7 Author: xuchuanyin <xuchuanyin@...> Date: 2018-08-10T14:02:57Z Support zstd as column compressor in final store 1. add zstd compressor for compressing column data 2. add zstd support in thrift 3. legacy store is not considered in this commit 4. since zstd does not support zero-copy while compressing, offheap will not take effect for zstd 5. support lazy load for compressor commit 6448e6f21da66172775b625730b922fdfa57822d Author: xuchuanyin <xuchuanyin@...> Date: 2018-08-13T13:45:42Z Support new compressor on legacy store In query procedure, we need to decompress the column page. Previously we get the compressor from system property. Now since we support new compressors, we should read the compressor information from the metadata in datafiles. This PR also solve the compatibility related problems on V1/V2 store where we only support snappy. commit 2815c84f1d5fd99ff37ba6890d98fb2b73a95b00 Author: xuchuanyin <xuchuanyin@...> Date: 2018-08-14T08:38:00Z fix comments commit ac95c25fca1c37f10f9cce0db76062207d0d3cee Author: xuchuanyin <xuchuanyin@...> Date: 2018-08-23T09:35:23Z Determine the column compressor before data loading we will get the column compressor before data loading/compaction start, so that it can make all the pages use the same compressor in case of concurrent modifying compressor during loading. commit a672d3baad1c476308c0aec5133e418afeaeacb2 Author: xuchuanyin <xuchuanyin@...> Date: 2018-08-27T11:18:30Z set compressor in carbon load model column compressor is necessary for carbon load model, otherwise load will fail. commit d05c1cc38e1fa42ef94f70577ee2a715f649ebe3 Author: xuchuanyin <xuchuanyin@...> Date: 2018-08-30T04:02:33Z fix error in test commit fb8cdfb1258b477f7d8c867ee74bd59386725d9c Author: xuchuanyin <xuchuanyin@...> Date: 2018-09-03T03:58:02Z fix review comments optimize parameters for column page, use columnPageEncodeMeta instead of its members ---- --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2628 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2628 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8275/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2628 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/204/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2628 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2628 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8277/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2628 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/206/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2628 I raised another PR #2689 to replace this PR --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2628 fix conflicts --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2628 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/254/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2628 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8324/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2628 retest this please --- |
Free forum by Nabble | Edit this page |