GitHub user xuchuanyin opened a pull request:
https://github.com/apache/carbondata/pull/1707 [CARBONDATA-1839] [DataLoad] Fix bugs and optimize in compressing sort temp files 1. fix bugs in compressing sort temp file, use file-level compression instead of batch-record-level compression 2. reduce duplicate code in reading & writing sort temp file and make it more readable 3. optimize sort procedure: Before: raw row that has been converted(call it 'RawRow' for short) -> sort on RawRow -> write RawRow to temp sort file -> read RawRow from temp sort file -> sort on RawRow -> ... -> at the final sort, sort on RawRow and convert the RawRow to 3 'PartedRow' -> write 'PartedRow' to DataFile in write procedure. After: raw row that has been converted(call it 'RawRow' for short) -> convert RawRow to 3 'PartedRow' -> sort on PartedRow -> write PartedRow to temp sort file -> read PartedRow from temp sort file -> sort on PartedRow -> ... -> at the final sort, sort on PartedRow -> write 'PartedRow' to DataFile in write procedure. 4. add tests 5. remove unused code 6. update docs, add property to configure the compressor Please refer to [maillist](http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Compression-for-sort-temp-files-in-Carbomdata-td31747.html) to get more information Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [X] Any interfaces changed? `YES, ONLY CHANGE INTERNAL INTERFACES` - [X] Any backward compatibility impacted? `NO` - [X] Document update required? `YES, RELATED DOCUMENT HAS BEEN UPDATED` - [X] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? `ADDED TESTS` - How it is tested? Please attach test report. `TESTED IN LOCAL CLUSTER` - Is it a performance related change? Please attach the performance test report. `YES` - Any additional information to help reviewers in testing this change. `The key point lies in` **`SortStepRowHandler`**`. It is used to convert raw row to 3-parted row and read/write row from/to sort temp file/unsafe memory` - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. `NOT RELATED` You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata bug_compress_sort_temp_1222 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1707.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1707 ---- commit 78684a172f0346584ee992bfc40750b03a9f814b Author: xuchuanyin <xuchuanyin@...> Date: 2017-12-07T08:31:58Z Fix bugs in compressing sort temp file 1. fix bugs in compressing sort temp file, use file-level compression instead of batch-record-level compression 2. reduce duplicate code in reading & writing sort temp file and make it more readable 3. optimize sort procedure: Before: raw row that has been converted(call it 'RawRow' for short) -> sort on RawRow -> write RawRow to temp sort file -> read RawRow from temp sort file -> sort on RawRow -> ... -> at the final sort, sort on RawRow and convert the RawRow to 3 'PartedRow' -> write 'PartedRow' to DataFile in write procedure. After: raw row that has been converted(call it 'RawRow' for short) -> convert RawRow to 3 'PartedRow' -> sort on PartedRow -> write PartedRow to temp sort file -> read PartedRow from temp sort file -> sort on PartedRow -> ... -> at the final sort, sort on PartedRow -> write 'PartedRow' to DataFile in write procedure. 4. add tests 5. remove unused code 6. update docs, add property to configure the compressor ---- --- |
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1707 This PR is to replace PR #1632 --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1707 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2247/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1707 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1024/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1707 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2506/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1707 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1707 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2294/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1707 Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1079/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1707 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1707 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2306/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1707 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1090/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1707 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1707 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1108/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1707 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2325/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1707 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1707 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1119/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1707 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2333/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1707 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2339/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1707 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1707 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2340/ --- |
Free forum by Nabble | Edit this page |