[GitHub] carbondata pull request #1808: [CARBONDATA-2023][DataLoad] Add size base blo...

classic Classic list List threaded Threaded
100 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    merged into carbonstore branch


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1808: [CARBONDATA-2023][DataLoad] Add size base blo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin closed the pull request at:

    https://github.com/apache/carbondata/pull/1808


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1808: [CARBONDATA-2023][DataLoad] Add size base blo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
GitHub user xuchuanyin reopened a pull request:

    https://github.com/apache/carbondata/pull/1808

    [CARBONDATA-2023][DataLoad] Add size base block allocation in data loading

    Carbondata assign blocks to nodes at the beginning of data loading.
    Previous block allocation strategy is block number based and it will
    suffer skewed data problem if the size of input files differs a lot.
   
    We introduced a size based block allocation strategy to optimize data
    loading performance in skewed data scenario.
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [x] Any interfaces changed?
     `Only changed the internal interfaces`
     - [x] Any backward compatibility impacted?
     `No`
     - [x] Document update required?
    `Updated the document`
     - [x] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
    `Added tests to verify the block-allocation correctness`
            - How it is tested? Please attach test report.
    `Tested in local 3-node cluster`
            - Is it a performance related change? Please attach the performance test report.
    ```
    In my scenario, the size of input data file varies from 1KB to about 5GB.
    Before enabling this feature, each executor processed the same number of blocks
     and the processed data size had a 5X gap.  --(block number based allocation)
    After enabling this feature, each executor processed almost the same size of data
     and the processed number of data blocks had 6X gap. -- (block size based allocation)
   
    The data loading performance had been promoted from 41MB/s/Node to 61MB/s/Node,
    about 50% performance enhancement gained.
    ```
   
            - Any additional information to help reviewers in testing this change.
           `I refactored the code to make it more readable. The core code mainly lies in CarbonLoaderUtil`
     - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    `Not related`


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuchuanyin/carbondata opt_size_base_block_allocation

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1808.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1808
   
----
commit eec35eae52affdfdbd05915ece2f19bedec5e310
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-02-08T06:42:39Z

    Add size based block allocation strategy in data loading
   
    Carbondata assign blocks to nodes at the beginning of data loading.
    Previous block allocation strategy is block number based and it will
    shuffer skewed data problem if the size of input files differs a lot.
   
    We introduced a size based block allocation strategy to optimize data
    loading performance in skewed data scenario.

commit da4f93dd7bbe9faa045c751cba7ae1dd22ce12e4
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-02-09T02:25:42Z

    Fix review comments

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3656/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3494/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2417/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2446/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3686/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    retest sdv please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3718/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2478/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3721/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2481/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3520/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    Merged to carbonstore branch


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1808: [CARBONDATA-2023][DataLoad] Add size base blo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin closed the pull request at:

    https://github.com/apache/carbondata/pull/1808


---
12345