GitHub user xuchuanyin opened a pull request:
https://github.com/apache/carbondata/pull/1953 [CARBONDATA-2091][DataLoad] Support specifying sort column bounds in data loading Enhance data loading performance by specifying sort column bounds 1. Add row range number during convert-process-step 2. Dispatch rows to each sorter by range number 3. Sort/Write process step can be done concurrently in each range Tests added and docs updated After implementing this feature, the data load performance has gained about 25% enhancement (80MB/s/Node -> 102MB/s/Node) in my scenario with only 1 bounds provided. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? `Only internal used interfaces are changed` - [x] Any backward compatibility impacted? `No` - [x] Document update required? `Yes, added the usage of this feature to documents` - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? `Yes` - How it is tested? Please attach test report. `Tested in 3-node cluster and local machine` - Is it a performance related change? Please attach the performance test report. `Yes. After implementing this feature, the data load performance has gained about 25% enhancement (80MB/s/Node -> 102MB/s/Node) in my scenario with only 1 bounds provided. ` - Any additional information to help reviewers in testing this change. `I refactored the bucket related feature and treated the range and bucket as the similar logic` - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. `Not related` You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata 0208_support_specifying_sort_column_bounds Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1953.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1953 ---- commit 11463dd22db17f2e1858e0a1f3ebfeb07e3ec0e9 Author: xuchuanyin <xuchuanyin@...> Date: 2018-02-08T08:30:09Z Support specifying sort column bounds in data loading Enhance data loading performance by specifying sort column bounds 1. Add row range number during convert-process-step 2. Dispatch rows to each sorter by range number 3. Sort/Write process step can be done concurrently in each range Tests added and docs updated ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1953 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2346/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1953 this PR depends on #1952 --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1953 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3586/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1953 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3435/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1953 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1953 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3603/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1953 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2365/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1953 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2367/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1953 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3605/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1953 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3454/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1953 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3637/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1953 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2398/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1953 retest this please --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1953 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3477/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1953 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3640/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1953 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2402/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1953 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3479/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1953 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2444/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1953 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3684/ --- |
Free forum by Nabble | Edit this page |