[GitHub] carbondata pull request #1953: [CARBONDATA-2091][DataLoad] Support specifyin...

classic Classic list List threaded Threaded
52 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1953: [CARBONDATA-2091][DataLoad] Support specifyin...

qiuchenjian-2
GitHub user xuchuanyin opened a pull request:

    https://github.com/apache/carbondata/pull/1953

    [CARBONDATA-2091][DataLoad] Support specifying sort column bounds in data loading

    Enhance data loading performance by specifying sort column bounds
    1. Add row range number during convert-process-step
    2. Dispatch rows to each sorter by range number
    3. Sort/Write process step can be done concurrently in each range
   
    Tests added and docs updated
   
    After implementing this feature, the data load performance has gained about 25% enhancement (80MB/s/Node -> 102MB/s/Node) in my scenario with only 1 bounds provided.
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [x] Any interfaces changed?
     `Only internal used interfaces are changed`
     - [x] Any backward compatibility impacted?
     `No`
     - [x] Document update required?
    `Yes, added the usage of this feature to documents`
     - [x] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
    `Yes`
            - How it is tested? Please attach test report.
    `Tested in 3-node cluster and local machine`
            - Is it a performance related change? Please attach the performance test report.
    `Yes. After implementing this feature, the data load performance has gained about 25% enhancement (80MB/s/Node -> 102MB/s/Node) in my scenario with only 1 bounds provided. `
            - Any additional information to help reviewers in testing this change.
           `I refactored the bucket related feature and treated the range and bucket as the similar logic`
     - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    `Not related`


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuchuanyin/carbondata 0208_support_specifying_sort_column_bounds

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1953.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1953
   
----
commit 11463dd22db17f2e1858e0a1f3ebfeb07e3ec0e9
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-02-08T08:30:09Z

    Support specifying sort column bounds in data loading
   
    Enhance data loading performance by specifying sort column bounds
    1. Add row range number during convert-process-step
    2. Dispatch rows to each sorter by range number
    3. Sort/Write process step can be done concurrently in each range
   
    Tests added and docs updated

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1953
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2346/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1953
 
    this PR depends on #1952


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1953
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3586/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1953
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3435/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1953
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1953
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3603/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1953
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2365/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1953
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2367/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1953
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3605/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1953
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3454/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1953
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3637/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1953
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2398/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1953
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1953
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3477/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1953
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3640/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1953
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2402/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1953
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3479/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1953
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2444/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1953
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3684/



---
123