[GitHub] [carbondata] marchpure opened a new pull request #3620: [CARBONDATA-3700] Optimize prune performance when prunning with multi…

classic Classic list List threaded Threaded
72 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-591297558
 
 
   Build Failed  with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/489/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-591297945
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2189/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

GitBox
In reply to this post by GitBox
marchpure commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-591348505
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-591356457
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/498/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-591386209
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2197/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-591418979
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/500/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-591454224
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2199/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

GitBox
In reply to this post by GitBox
marchpure commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-593936776
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-593943948
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/583/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-593979228
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2289/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-594040592
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/589/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-594056183
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/590/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-594071943
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2296/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-594096114
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/591/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-594133805
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2297/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-594302744
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/593/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-594321946
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2300/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

GitBox
In reply to this post by GitBox
marchpure commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-594379642
 
 
   Case:
   select dtm, hh, user_id, event_id, instance_id, action, collect_time        
        from redods.dw_log_ubt_partition_carbon        
        where user_id = 'user_id_x' and dtm > 'start_dtm' and dtm <= 'end_dtm'          
        limit 100;
   
   
     | 18TB | 42tb | 90TB | 180TB | 270TB
   -- | -- | -- | -- | -- | --
   query overhead after   optimization(s) | 7.90 | 8.09 | 8.48 | 10.44 | 11.68
   query overhead before   optimization(s) | 6.60 | 8.36 | 13.55 | 28.93 | 40.10
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

GitBox
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r387513087
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
 ##########
 @@ -1843,6 +1846,33 @@ public static int getNumOfThreadsForPruning() {
     return numOfThreadsForPruning;
   }
 
+  /**
+   * This method validates the driverPruningMultiThreadEnableFilesCount
+   */
+  public static int getDriverPruningMultiThreadEnableFilesCount() {
+    int driverPruningMultiThreadEnableFilesCount = Integer.parseInt(CarbonProperties.getInstance()
 
 Review comment:
   Parsing the user configured value should be moved inside try block

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

GitBox
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#discussion_r387523821
 
 

 ##########
 File path: docs/configuration-parameters.md
 ##########
 @@ -145,6 +145,7 @@ This section provides the details of all the configurations required for the Car
 | carbon.push.rowfilters.for.vector | false | When enabled complete row filters will be handled by carbon in case of vector. If it is disabled then only page level pruning will be done by carbon and row level filtering will be done by spark for vector. And also there are scan optimizations in carbon to avoid multiple data copies when this parameter is set to false. There is no change in flow for non-vector based queries. |
 | carbon.query.prefetch.enable | true | By default this property is true, so prefetch is used in query to read next blocklet asynchronously in other thread while processing current blocklet in main thread. This can help to reduce CPU idle time. Setting this property false will disable this prefetch feature in query. |
 | carbon.query.stage.input.enable | false | Stage input files are data files written by external applications (such as Flink), but have not been loaded into carbon table. Enabling this configuration makes query to include these files, thus makes query on latest data. However, since these files are not indexed, query maybe slower as full scan is required for these files. |
+| carbon.driver.pruning.multi.thread.enable.files.count | 100000 | To prune in multi-thread when total number of files of queried segments beyonds the configured value. |
 
 Review comment:
   ```suggestion
   | carbon.driver.pruning.multi.thread.enable.files.count | 100000 | To prune in multi-thread when total number of segment files for a query increases beyond the configured value. |
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
1234