[GitHub] carbondata pull request #2936: [WIP] Parallelize block pruning of default da...

classic Classic list List threaded Threaded
77 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1529/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9787/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1739/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2936
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2936
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1543/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2936


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1754/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9802/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r236564769
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
    @@ -63,6 +75,8 @@
     
       private SegmentPropertiesFetcher segmentPropertiesFetcher;
     
    +  private static final Log LOG = LogFactory.getLog(TableDataMap.class);
    --- End diff --
   
    We do not use apache-common-logs in carbondata project! Please take care of this


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r236565153
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1399,6 +1399,17 @@ private CarbonCommonConstants() {
     
       public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = "false";
     
    +  /**
    +   * max driver threads used for block pruning [1 to 4 threads]
    +   */
    +  @CarbonProperty public static final String CARBON_MAX_DRIVER_THREADS_FOR_BLOCK_PRUNING =
    +      "carbon.max.driver.threads.for.block.pruning";
    --- End diff --
   
    I think it's better to use the name
    `carbon.query.pruning.parallelism.driver`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r236568719
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -487,6 +487,8 @@ private int getBlockCount(List<ExtendedBlocklet> blocklets) {
         // First prune using default datamap on driver side.
         TableDataMap defaultDataMap = DataMapStoreManager.getInstance().getDefaultDataMap(carbonTable);
         List<ExtendedBlocklet> prunedBlocklets = null;
    +    // This is to log the event, so user will know what is happening by seeing logs.
    +    LOG.info("Started block pruning ...");
    --- End diff --
   
    Instead of adding these logs, I think we'd better add the time consumed for pruning in statistics.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r236565449
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1399,6 +1399,17 @@ private CarbonCommonConstants() {
     
       public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = "false";
     
    +  /**
    +   * max driver threads used for block pruning [1 to 4 threads]
    +   */
    +  @CarbonProperty public static final String CARBON_MAX_DRIVER_THREADS_FOR_BLOCK_PRUNING =
    +      "carbon.max.driver.threads.for.block.pruning";
    +
    +  public static final String CARBON_MAX_DRIVER_THREADS_FOR_BLOCK_PRUNING_DEFAULT = "4";
    +
    +  // block prune in multi-thread if files size more than 100K files.
    +  public static final int CARBON_DRIVER_PRUNING_MULTI_THREAD_ENABLE_FILES_COUNT = 100000;
    --- End diff --
   
    Why add this constraint?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r237173126
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -487,6 +487,8 @@ private int getBlockCount(List<ExtendedBlocklet> blocklets) {
         // First prune using default datamap on driver side.
         TableDataMap defaultDataMap = DataMapStoreManager.getInstance().getDefaultDataMap(carbonTable);
         List<ExtendedBlocklet> prunedBlocklets = null;
    +    // This is to log the event, so user will know what is happening by seeing logs.
    +    LOG.info("Started block pruning ...");
    --- End diff --
   
    log will anyways have timestap, we can subtract stop and start time. I have another non-default datamap PR. I will check about this.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r237173725
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1399,6 +1399,17 @@ private CarbonCommonConstants() {
     
       public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = "false";
     
    +  /**
    +   * max driver threads used for block pruning [1 to 4 threads]
    +   */
    +  @CarbonProperty public static final String CARBON_MAX_DRIVER_THREADS_FOR_BLOCK_PRUNING =
    +      "carbon.max.driver.threads.for.block.pruning";
    +
    +  public static final String CARBON_MAX_DRIVER_THREADS_FOR_BLOCK_PRUNING_DEFAULT = "4";
    +
    +  // block prune in multi-thread if files size more than 100K files.
    +  public static final int CARBON_DRIVER_PRUNING_MULTI_THREAD_ENABLE_FILES_COUNT = 100000;
    --- End diff --
   
    because driver doing multi-thread  default may impact concurrent queries, also by testing observed that 100k datamap takes 1 second.  If block pruning taking more than a second then only multi-thead


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r237173860
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1399,6 +1399,17 @@ private CarbonCommonConstants() {
     
       public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = "false";
     
    +  /**
    +   * max driver threads used for block pruning [1 to 4 threads]
    +   */
    +  @CarbonProperty public static final String CARBON_MAX_DRIVER_THREADS_FOR_BLOCK_PRUNING =
    +      "carbon.max.driver.threads.for.block.pruning";
    --- End diff --
   
    I have another non-default datamap PR. I will check about this. I feel this name also OK


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r237174006
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
    @@ -63,6 +75,8 @@
     
       private SegmentPropertiesFetcher segmentPropertiesFetcher;
     
    +  private static final Log LOG = LogFactory.getLog(TableDataMap.class);
    --- End diff --
   
    ok


---
1234