[GitHub] [carbondata] xiaohui0318 opened a new pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] xiaohui0318 opened a new pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

GitBox

xiaohui0318 opened a new pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788


    ### Why is this PR needed?
    When a query is executed, all databases of metadata are scanned, and if the number of databases is very large, the execution time increases for a long time each time, reducing query performance
   
    ### What changes were proposed in this PR?
   scan the relevant database instead of scanning all
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

GitBox

CarbonDataQA1 commented on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-639250055


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3136/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-639250322


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1412/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-639378003


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1414/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-639378978


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3138/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] niuge01 commented on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

GitBox
In reply to this post by GitBox

niuge01 commented on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-645329423


   May be create an mv in database A as select from a table in database B, when query the table, we should check all mv on this table, not only in database B.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] xiaohui0318 commented on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

GitBox
In reply to this post by GitBox

xiaohui0318 commented on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-645348038


   > May be create an mv in database A as select from a table in database B, when query the table, we should check all mv on this table, not only in database B.
   
   We have a cluster with more than 1,500 databases, and it takes more than 20 seconds to scan each select statement, no matter how simple the SQL. I wonder it should be placed in a specified database?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

GitBox
In reply to this post by GitBox

VenuReddy2103 commented on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078


   If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example.
   `spark.sql("create database db1")
   spark.sql("create database db2")
   spark.sql("create database db3")
   spark.sql("use db1")
   spark.sql("create table db1_table(a int, b int) stored as carbondata")
   spark.sql("insert into db1_table select 1, 2")
   spark.sql("use db2")
   spark.sql("create table db2_table(i int, j int) stored as carbondata")
   spark.sql("insert into db2_table select 1, 4")
   spark.sql("use db3")
   spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i")
   spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)`
   
   if we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 edited a comment on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

GitBox
In reply to this post by GitBox

VenuReddy2103 edited a comment on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078


   If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example.
   <br>`spark.sql("create database db1")`
   <br>`spark.sql("create database db2")`
   spark.sql("create database db3")
   spark.sql("use db1")
   spark.sql("create table db1_table(a int, b int) stored as carbondata")
   spark.sql("insert into db1_table select 1, 2")
   spark.sql("use db2")
   spark.sql("create table db2_table(i int, j int) stored as carbondata")
   spark.sql("insert into db2_table select 1, 4")
   spark.sql("use db3")
   spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i")
   spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)`
   
   if we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 edited a comment on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

GitBox
In reply to this post by GitBox

VenuReddy2103 edited a comment on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078


   If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example.
   <br>`spark.sql("create database db1")`
   spark.sql("create database db2")`
   spark.sql("create database db3")
   spark.sql("use db1")
   spark.sql("create table db1_table(a int, b int) stored as carbondata")
   spark.sql("insert into db1_table select 1, 2")
   spark.sql("use db2")
   spark.sql("create table db2_table(i int, j int) stored as carbondata")
   spark.sql("insert into db2_table select 1, 4")
   spark.sql("use db3")
   spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i")
   spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)`
   
   if we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 edited a comment on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

GitBox
In reply to this post by GitBox

VenuReddy2103 edited a comment on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078


   If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example.
   <br>`spark.sql("create database db1")`
   `<br>spark.sql("create database db2")`
   spark.sql("create database db3")
   spark.sql("use db1")
   spark.sql("create table db1_table(a int, b int) stored as carbondata")
   spark.sql("insert into db1_table select 1, 2")
   spark.sql("use db2")
   spark.sql("create table db2_table(i int, j int) stored as carbondata")
   spark.sql("insert into db2_table select 1, 4")
   spark.sql("use db3")
   spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i")
   spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)`
   
   if we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 edited a comment on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

GitBox
In reply to this post by GitBox

VenuReddy2103 edited a comment on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078


   If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example.
   <br>`spark.sql("create database db1")`<br>
   `spark.sql("create database db2")`
   spark.sql("create database db3")
   spark.sql("use db1")
   spark.sql("create table db1_table(a int, b int) stored as carbondata")
   spark.sql("insert into db1_table select 1, 2")
   spark.sql("use db2")
   spark.sql("create table db2_table(i int, j int) stored as carbondata")
   spark.sql("insert into db2_table select 1, 4")
   spark.sql("use db3")
   spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i")
   spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)`
   
   if we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 edited a comment on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

GitBox
In reply to this post by GitBox

VenuReddy2103 edited a comment on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078


   If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example.
   <br>`spark.sql("create database db1")`
   <br>`spark.sql("create database db2")`
   <br>`spark.sql("create database db3")`
   <br>`spark.sql("use db1")`
   <br>`spark.sql("create table db1_table(a int, b int) stored as carbondata")`
   <br>`spark.sql("insert into db1_table select 1, 2")`
   <br>`spark.sql("use db2")`
   <br>`spark.sql("create table db2_table(i int, j int) stored as carbondata")`
   <br>`spark.sql("insert into db2_table select 1, 4")`
   <br>`spark.sql("use` db3")`
   <br>`spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i")`
   <br>`spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)`
   
   if we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 edited a comment on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

GitBox
In reply to this post by GitBox

VenuReddy2103 edited a comment on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078


   If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example.
   <br>`spark.sql("create database db1")`
   <br>`spark.sql("create database db2")`
   <br>`spark.sql("create database db3")`
   <br>`spark.sql("use db1")`
   <br>`spark.sql("create table db1_table(a int, b int) stored as carbondata")`
   <br>`spark.sql("insert into db1_table select 1, 2")`
   <br>`spark.sql("use db2")`
   <br>`spark.sql("create table db2_table(i int, j int) stored as carbondata")`
   <br>`spark.sql("insert into db2_table select 1, 4")`
   <br>`spark.sql("use db3")`
   <br>`spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i")`
   <br>`spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)`
   
   **if we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan.**


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 edited a comment on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

GitBox
In reply to this post by GitBox

VenuReddy2103 edited a comment on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078


   If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example.
   <br>`spark.sql("create database db1")`
   <br>`spark.sql("create database db2")`
   <br>`spark.sql("create database db3")`
   <br>`spark.sql("use db1")`
   <br>`spark.sql("create table db1_table(a int, b int) stored as carbondata")`
   <br>`spark.sql("insert into db1_table select 1, 2")`
   <br>`spark.sql("use db2")`
   <br>`spark.sql("create table db2_table(i int, j int) stored as carbondata")`
   <br>`spark.sql("insert into db2_table select 1, 4")`
   <br>`spark.sql("use db3")`
   <br>`spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i")`
   <br>`spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)`
   
   If we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-697198143


   @VenuReddy2103 , maybe such a test case can be added in the code


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]