GitHub user xuchuanyin opened a pull request:
https://github.com/apache/carbondata/pull/1632 [CARBONDATA-1839] [DataLoad]Fix bugs in compressing sort temp files Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [X] Any interfaces changed? `YES, ONLY CHANGE INTERNAL INTERFACES` - [X] Any backward compatibility impacted? `NO` - [X] Document update required? `YES` - [X] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? `ADDED TESTS` - How it is tested? Please attach test report. `TESTED IN LOCAL CLUSTER` - Is it a performance related change? Please attach the performance test report. `YES` - Any additional information to help reviewers in testing this change. `There are some duplicate code in write temp sort files found during this bug fixing and I plan to optimize it in successive PR not in this one.` - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. `NOT RELATED` RESOLVE === 1. Fix bugs in compressing sort temp file 2. Reduce duplicate code in reading & writing sort temp file and make it more readable 3. Optimize sort procedure: Before: ```flow st=>start: raw row that has been converted(call it 'RawRow' for short) e=>end: write 'PartedRow' to DataFile in write procedure op1=>operation: read RawRow from temp sort file op2=>operation: sort on RawRow op3=>operation: write RawRow to temp sort file cond=>condition: final sort? op4=>operation: sort on RawRow op5=>operation: convert each RawRow to 3 'PartedRow' st->op1->op2->op3->cond cond(no)->op1 cond(yes)->op4->op5->e ``` Afterï¼ ```flow st=>start: raw row that has been converted(call it 'RawRow' for short) e=>end: write 'PartedRow' to DataFile in write procedure op1=>operation: convert RawRow to 3 'PartedRow' op2=>operation: read PartedRow from temp sort file op3=>operation: sort on PartedRow op4=>operation: write PartedRow to temp sort file cond=>condition: final sort? op5=>operation: sort on PartedRow st->op1->op2->op3->op4->cond cond(no)->op2 cond(yes)->op5->e ``` 4. Add tests to enable sort_temp_file_compressed while doing data loading You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata bug_sort_temp_compress_1207 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1632.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1632 ---- commit fb46e1288ae3150700a6508298f1ec9dcc8d37c2 Author: xuchuanyin <[hidden email]> Date: 2017-12-07T08:31:58Z Fix bugs in compressing sort temp file 1. fix bugs in compressing sort temp file 2. reduce duplicate code in reading & writing sort temp file and make it more readable 3. optimize sort procedure: Before: raw row that has been converted(call it 'RawRow' for short) -> sort on RawRow -> write RawRow to temp sort file -> read RawRow from temp sort file -> sort on RawRow -> ... -> at the final sort, sort on RawRow and convert the RawRow to 3 'PartedRow' -> write 'PartedRow' to DataFile in write procedure. After: raw row that has been converted(call it 'RawRow' for short) -> convert RawRow to 3 'PartedRow' -> sort on PartedRow -> write PartedRow to temp sort file -> read PartedRow from temp sort file -> sort on PartedRow -> ... -> at the final sort, sort on PartedRow -> write 'PartedRow' to DataFile in write procedure. 4. add tests ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/577/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1632 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/612/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1840/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/615/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1843/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1632 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1847/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/619/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/634/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1863/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1632 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2210/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1632 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/653/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1884/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1911/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/682/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1632 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2244/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1632 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1915/ --- |
Free forum by Nabble | Edit this page |