[GitHub] carbondata pull request #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compr...

classic Classic list List threaded Threaded
44 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compr...

qiuchenjian-2
GitHub user xuchuanyin opened a pull request:

    https://github.com/apache/carbondata/pull/1632

    [CARBONDATA-1839] [DataLoad]Fix bugs in compressing sort temp files

    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [X] Any interfaces changed?
      `YES, ONLY CHANGE INTERNAL INTERFACES`
     - [X] Any backward compatibility impacted?
      `NO`
     - [X] Document update required?
      `YES`
     - [X] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            `ADDED TESTS`
            - How it is tested? Please attach test report.
            `TESTED IN LOCAL CLUSTER`
            - Is it a performance related change? Please attach the performance test report.
            `YES`
            - Any additional information to help reviewers in testing this change.
            `There are some duplicate code in write temp sort files found during this bug fixing and I plan to optimize it in successive PR not in this one.`
     - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
            `NOT RELATED`
   
    RESOLVE
    ===
   
    1. Fix bugs in compressing sort temp file
   
    2. Reduce duplicate code in reading & writing sort temp file
     and make it more readable
   
    3. Optimize sort procedure:
   
    Before:
   
    ```flow
    st=>start: raw row that has been converted(call it 'RawRow' for short)
    e=>end: write 'PartedRow' to DataFile in write procedure
    op1=>operation: read RawRow from temp sort file
    op2=>operation: sort on RawRow
    op3=>operation: write RawRow to temp sort file
    cond=>condition: final sort?
    op4=>operation: sort on RawRow
    op5=>operation: convert each RawRow to 3 'PartedRow'
   
    st->op1->op2->op3->cond
    cond(no)->op1
    cond(yes)->op4->op5->e
    ```
    After:
   
    ```flow
    st=>start: raw row that has been converted(call it 'RawRow' for short)
    e=>end: write 'PartedRow' to DataFile in write procedure
    op1=>operation: convert RawRow to 3 'PartedRow'
    op2=>operation: read PartedRow from temp sort file
    op3=>operation: sort on PartedRow
    op4=>operation: write PartedRow to temp sort file
    cond=>condition: final sort?
    op5=>operation: sort on PartedRow
   
    st->op1->op2->op3->op4->cond
    cond(no)->op2
    cond(yes)->op5->e
    ```
   
    4. Add tests to enable sort_temp_file_compressed while doing data loading

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuchuanyin/carbondata bug_sort_temp_compress_1207

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1632.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1632
   
----
commit fb46e1288ae3150700a6508298f1ec9dcc8d37c2
Author: xuchuanyin <[hidden email]>
Date:   2017-12-07T08:31:58Z

    Fix bugs in compressing sort temp file
   
    1. fix bugs in compressing sort temp file
   
    2. reduce duplicate code in reading & writing sort temp file
     and make it more readable
   
    3. optimize sort procedure:
   
    Before:
     raw row that has been converted(call it 'RawRow' for short) ->
     sort on RawRow ->
     write RawRow to temp sort file ->
     read RawRow from temp sort file ->
     sort on RawRow -> ... ->
     at the final sort, sort on RawRow and convert the RawRow to 3 'PartedRow' ->
     write 'PartedRow' to DataFile in write procedure.
   
    After:
     raw row that has been converted(call it 'RawRow' for short) ->
     convert RawRow to 3 'PartedRow' ->
     sort on PartedRow ->
     write PartedRow to temp sort file ->
     read PartedRow from temp sort file ->
     sort on PartedRow -> ... ->
     at the final sort, sort on PartedRow ->
     write 'PartedRow' to DataFile in write procedure.
   
    4. add tests

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/577/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/612/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1840/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/615/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1843/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1847/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/619/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/634/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1863/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2210/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/653/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1884/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1911/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/682/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2244/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1915/



---
123