[GitHub] carbondata pull request #1707: [CARBONDATA-1839] [DataLoad] Fix bugs and opt...

classic Classic list List threaded Threaded
53 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1707: [CARBONDATA-1839] [DataLoad] Fix bugs and opt...

qiuchenjian-2
GitHub user xuchuanyin opened a pull request:

    https://github.com/apache/carbondata/pull/1707

    [CARBONDATA-1839] [DataLoad] Fix bugs and optimize in compressing sort temp files

    1. fix bugs in compressing sort temp file, use file-level compression
    instead of batch-record-level compression
   
    2. reduce duplicate code in reading & writing sort temp file
     and make it more readable
   
    3. optimize sort procedure:
   
    Before:
     raw row that has been converted(call it 'RawRow' for short) ->
     sort on RawRow ->
     write RawRow to temp sort file ->
     read RawRow from temp sort file ->
     sort on RawRow -> ... ->
     at the final sort, sort on RawRow and convert the RawRow to 3 'PartedRow' ->
     write 'PartedRow' to DataFile in write procedure.
   
    After:
     raw row that has been converted(call it 'RawRow' for short) ->
     convert RawRow to 3 'PartedRow' ->
     sort on PartedRow ->
     write PartedRow to temp sort file ->
     read PartedRow from temp sort file ->
     sort on PartedRow -> ... ->
     at the final sort, sort on PartedRow ->
     write 'PartedRow' to DataFile in write procedure.
   
    4. add tests
   
    5. remove unused code
   
    6. update docs, add property to configure the compressor
   
    Please refer to [maillist](http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Compression-for-sort-temp-files-in-Carbomdata-td31747.html) to get more information
   
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [X] Any interfaces changed?
      `YES, ONLY CHANGE INTERNAL INTERFACES`
     - [X] Any backward compatibility impacted?
      `NO`
     - [X] Document update required?
      `YES, RELATED DOCUMENT HAS BEEN UPDATED`
     - [X] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            `ADDED TESTS`
            - How it is tested? Please attach test report.
            `TESTED IN LOCAL CLUSTER`
            - Is it a performance related change? Please attach the performance test report.
            `YES`
            - Any additional information to help reviewers in testing this change.
            `The key point lies in` **`SortStepRowHandler`**`. It is used to convert raw row to 3-parted row and read/write row from/to sort temp file/unsafe memory`
     - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
            `NOT RELATED`

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuchuanyin/carbondata bug_compress_sort_temp_1222

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1707.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1707
   
----
commit 78684a172f0346584ee992bfc40750b03a9f814b
Author: xuchuanyin <xuchuanyin@...>
Date:   2017-12-07T08:31:58Z

    Fix bugs in compressing sort temp file
   
    1. fix bugs in compressing sort temp file, use file-level compression
    instead of batch-record-level compression
   
    2. reduce duplicate code in reading & writing sort temp file
     and make it more readable
   
    3. optimize sort procedure:
   
    Before:
     raw row that has been converted(call it 'RawRow' for short) ->
     sort on RawRow ->
     write RawRow to temp sort file ->
     read RawRow from temp sort file ->
     sort on RawRow -> ... ->
     at the final sort, sort on RawRow and convert the RawRow to 3 'PartedRow' ->
     write 'PartedRow' to DataFile in write procedure.
   
    After:
     raw row that has been converted(call it 'RawRow' for short) ->
     convert RawRow to 3 'PartedRow' ->
     sort on PartedRow ->
     write PartedRow to temp sort file ->
     read PartedRow from temp sort file ->
     sort on PartedRow -> ... ->
     at the final sort, sort on PartedRow ->
     write 'PartedRow' to DataFile in write procedure.
   
    4. add tests
   
    5. remove unused code
   
    6. update docs, add property to configure the compressor

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1707: [CARBONDATA-1839] [DataLoad] Fix bugs and optimize i...

qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1707
 
    This PR is to replace PR #1632


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1707: [CARBONDATA-1839] [DataLoad] Fix bugs and optimize i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1707
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2247/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1707: [CARBONDATA-1839] [DataLoad] Fix bugs and optimize i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1707
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1024/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1707: [CARBONDATA-1839] [DataLoad] Fix bugs and optimize i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1707
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2506/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1707: [CARBONDATA-1839] [DataLoad] Fix bugs and optimize i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1707
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1707: [CARBONDATA-1839] [DataLoad] Fix bugs and optimize i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1707
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2294/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1707: [CARBONDATA-1839] [DataLoad] Fix bugs and optimize i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1707
 
    Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1079/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1707: [CARBONDATA-1839] [DataLoad] Fix bugs and optimize i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1707
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1707: [CARBONDATA-1839] [DataLoad] Fix bugs and optimize i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1707
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2306/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1707: [CARBONDATA-1839] [DataLoad] Fix bugs and optimize i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1707
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1090/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1707: [CARBONDATA-1839] [DataLoad] Fix bugs and optimize i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1707
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1707: [CARBONDATA-1839] [DataLoad] Fix bugs and optimize i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1707
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1108/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1707: [CARBONDATA-1839] [DataLoad] Fix bugs and optimize i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1707
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2325/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1707: [CARBONDATA-1839] [DataLoad] Fix bugs and optimize i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1707
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1707: [CARBONDATA-1839] [DataLoad] Fix bugs and optimize i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1707
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1119/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1707: [CARBONDATA-1839] [DataLoad] Fix bugs and optimize i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1707
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2333/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1707: [CARBONDATA-1839] [DataLoad] Fix bugs and optimize i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1707
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2339/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1707: [CARBONDATA-1839] [DataLoad] Fix bugs and optimize i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1707
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1707: [CARBONDATA-1839] [DataLoad] Fix bugs and optimize i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1707
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2340/



---
123