[GitHub] carbondata pull request #1808: [CARBONDATA-2023][DataLoad] Add size base blo...

classic Classic list List threaded Threaded
100 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2970/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1740/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3246/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3336/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2100/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3369/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2133/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3577/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2339/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    this PR depends on #1952


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3427/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1808: [CARBONDATA-2023][DataLoad] Add size base blo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1808#discussion_r166936712
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java ---
    @@ -114,4 +114,14 @@
        */
       public static final int MAX_EXTERNAL_DICTIONARY_SIZE = 10000000;
     
    +  /**
    +   * enable block size based block allocation while loading data. By default, carbondata assigns
    +   * blocks to node based on block number. If this option is set to `true`, carbondata will
    +   * consider block size first and make sure that all the nodes will process almost equal size of
    +   * data. This option is especially useful when you encounter skewed data.
    +   */
    +  @CarbonProperty
    +  public static final String ENABLE_CARBON_LOAD_SKEWED_DATA_OPTIMIZATION
    +      = "carbon.load.skewed.data.optimization";
    --- End diff --
   
    change to `carbon.load.skewedDataOptimization.enabled`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3598/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1808
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2360/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1808: [CARBONDATA-2023][DataLoad] Add size base blo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1808#discussion_r167115544
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/block/TableBlockInfo.java ---
    @@ -420,4 +436,16 @@ public String getDataMapWriterPath() {
       public void setDataMapWriterPath(String dataMapWriterPath) {
         this.dataMapWriterPath = dataMapWriterPath;
       }
    +
    +  @Override public String toString() {
    --- End diff --
   
    move @Override to previous line


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1808: [CARBONDATA-2023][DataLoad] Add size base blo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1808#discussion_r167115611
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java ---
    @@ -1184,6 +1184,17 @@ public String getSortTempCompressor() {
           return CarbonCommonConstants.CARBON_SORT_TEMP_COMPRESSOR_DEFAULT;
         }
       }
    +
    +  /**
    +   * whether optimization for skewed data is enabled
    +   * @return true, if enabled; false for not enabled.
    +   */
    +  public boolean isCarbonLoadSkewedDataOptimizationEnabled() {
    --- End diff --
   
    change to `isLoadSkewedDataOptimizationEnabled`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1808: [CARBONDATA-2023][DataLoad] Add size base blo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1808#discussion_r167115897
 
    --- Diff: docs/useful-tips-on-carbondata.md ---
    @@ -169,5 +169,6 @@
       | carbon.use.local.dir | spark/carbonlib/carbon.properties | Data loading | Whether use YARN local directories for multi-table load disk load balance | If this is set it to true CarbonData will use YARN local directories for multi-table load disk load balance, that will improve the data load performance. |
       | carbon.use.multiple.temp.dir | spark/carbonlib/carbon.properties | Data loading | Whether to use multiple YARN local directories during table data loading for disk load balance | After enabling 'carbon.use.local.dir', if this is set to true, CarbonData will use all YARN local directories during data load for disk load balance, that will improve the data load performance. Please enable this property when you encounter disk hotspot problem during data loading. |
       | carbon.sort.temp.compressor | spark/carbonlib/carbon.properties | Data loading | Specify the name of compressor to compress the intermediate sort temporary files during sort procedure in data loading. | The optional values are 'SNAPPY','GZIP','BZIP2','LZ4' and empty. By default, empty means that Carbondata will not compress the sort temp files. This parameter will be useful if you encounter disk bottleneck. |
    +  | carbon.load.skewed.data.optimization | spark/carbonlib/carbon.properties | Data loading | Whether to enable size based block allocation strategy for data loading. | Carbondata will use number based block allocation strategy by default and it will make sure that all the executors process the same number of blocks. If this value is set to true, Carbondata will make sure that all the executors process the same size of data -- It's useful if the size of your input data files varies widely, say 1MB~1GB. |
    --- End diff --
   
    `Carbondata will use number based block allocation strategy by default` change to
    `When loading, carbondata will use file size based block allocation strategy for task distribution`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1808: [CARBONDATA-2023][DataLoad] Add size base blo...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1808#discussion_r167119304
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java ---
    @@ -70,6 +79,22 @@
       private CarbonLoaderUtil() {
       }
     
    +  /**
    +   * strategy for assign blocks to nodes/executors
    +   */
    +  public enum BlockAssignmentStrategy {
    +    BLOCK_NUM_FIRST("Assign blocks to node base on number of blocks"),
    +    BLOCK_SIZE_FIRST("Assign blocks to node base on data size of blocks");
    +    private String name;
    +    BlockAssignmentStrategy(String name) {
    +      this.name = name;
    +    }
    +
    +    @Override public String toString() {
    --- End diff --
   
    move @Override to previous line, please follow this in future


---
12345