[GitHub] [carbondata] ajantha-bhat opened a new pull request #3858: [CARBONDATA-3919] Improve concurrent query perfromance

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat opened a new pull request #3858: [CARBONDATA-3919] Improve concurrent query perfromance

GitBox

ajantha-bhat opened a new pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858


    ### Why is this PR needed?
    problem1: when 500 queries executed concurrently.
    checkIfRefreshIsNeeded method was synchronized. so only one thread was working at a time.
    But actually synchronization is required only when schema modified to drop tables. Not for whole function.
   
   problem2:  
   TokenCache.obtainTokensForNamenodes was causing a performance bottleneck for concurrent queries.
    so, removed it
   
    ### What changes were proposed in this PR?
   for problem1: synchronize only remove table part. Observed 500 query total performance improved from 10s to 3 seconds in cluster.
   
   for problem2:
    avoid calling the API.
       
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - No
   
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3858: [CARBONDATA-3919] Improve concurrent query perfromance

GitBox

ajantha-bhat commented on a change in pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#discussion_r458722832



##########
File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java
##########
@@ -472,9 +471,6 @@ public IndexFilter getFilterPredicates(Configuration configuration) {
     QueryStatisticsRecorder recorder = CarbonTimeStatisticsFactory.createDriverRecorder();
     QueryStatistic statistic = new QueryStatistic();
 
-    // get tokens for all the required FileSystem for table path

Review comment:
       @ Reviewers: let me know if any impact by remove this call. Removed and tested in closer, didn't find any problem.
   so, is it required ? is this API call is responsible for renewing tokens ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3858: [CARBONDATA-3919] Improve concurrent query perfromance

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#discussion_r458722832



##########
File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java
##########
@@ -472,9 +471,6 @@ public IndexFilter getFilterPredicates(Configuration configuration) {
     QueryStatisticsRecorder recorder = CarbonTimeStatisticsFactory.createDriverRecorder();
     QueryStatistic statistic = new QueryStatistic();
 
-    // get tokens for all the required FileSystem for table path

Review comment:
       @ Reviewers: let me know if any impact by removing this call. Removed and tested in cluster, didn't find any problem.
   so, is it required ? is this API call is responsible for renewing tokens ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#issuecomment-662485004


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1727/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#issuecomment-662495412


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3469/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#issuecomment-671171540


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3858: [CARBONDATA-3919] Improve concurrent query performance

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#discussion_r467696850



##########
File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java
##########
@@ -472,9 +471,6 @@ public IndexFilter getFilterPredicates(Configuration configuration) {
     QueryStatisticsRecorder recorder = CarbonTimeStatisticsFactory.createDriverRecorder();
     QueryStatistic statistic = new QueryStatistic();
 
-    // get tokens for all the required FileSystem for table path

Review comment:
       Also removed and tested in user environment, no issues observed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#issuecomment-671218383


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3672/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#issuecomment-671226040


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1933/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#issuecomment-671226769


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#issuecomment-671287567


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3677/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#issuecomment-671330199


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1938/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance

GitBox
In reply to this post by GitBox

akashrn5 commented on pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#issuecomment-675231491


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#issuecomment-675268681


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2009/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#issuecomment-675270450


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3750/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#issuecomment-675272382


   @akashrn5 : 2.4.5 build has a random failure, observed in other PR's also. you can merge this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#issuecomment-675277686


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#issuecomment-675349093


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3756/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#issuecomment-675351077


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2015/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance

GitBox
In reply to this post by GitBox

akashrn5 commented on pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#issuecomment-675355454


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


12