xiaohui0318 opened a new pull request #3788: URL: https://github.com/apache/carbondata/pull/3788 ### Why is this PR needed? When a query is executed, all databases of metadata are scanned, and if the number of databases is very large, the execution time increases for a long time each time, reducing query performance ### What changes were proposed in this PR? scan the relevant database instead of scanning all ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA1 commented on pull request #3788: URL: https://github.com/apache/carbondata/pull/3788#issuecomment-639250055 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3136/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3788: URL: https://github.com/apache/carbondata/pull/3788#issuecomment-639250322 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1412/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3788: URL: https://github.com/apache/carbondata/pull/3788#issuecomment-639378003 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1414/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3788: URL: https://github.com/apache/carbondata/pull/3788#issuecomment-639378978 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3138/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
niuge01 commented on pull request #3788: URL: https://github.com/apache/carbondata/pull/3788#issuecomment-645329423 May be create an mv in database A as select from a table in database B, when query the table, we should check all mv on this table, not only in database B. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
xiaohui0318 commented on pull request #3788: URL: https://github.com/apache/carbondata/pull/3788#issuecomment-645348038 > May be create an mv in database A as select from a table in database B, when query the table, we should check all mv on this table, not only in database B. We have a cluster with more than 1,500 databases, and it takes more than 20 seconds to scan each select statement, no matter how simple the SQL. I wonder it should be placed in a specified database? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 commented on pull request #3788: URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078 If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example. `spark.sql("create database db1") spark.sql("create database db2") spark.sql("create database db3") spark.sql("use db1") spark.sql("create table db1_table(a int, b int) stored as carbondata") spark.sql("insert into db1_table select 1, 2") spark.sql("use db2") spark.sql("create table db2_table(i int, j int) stored as carbondata") spark.sql("insert into db2_table select 1, 4") spark.sql("use db3") spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i") spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)` if we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 edited a comment on pull request #3788: URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078 If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example. <br>`spark.sql("create database db1")` <br>`spark.sql("create database db2")` spark.sql("create database db3") spark.sql("use db1") spark.sql("create table db1_table(a int, b int) stored as carbondata") spark.sql("insert into db1_table select 1, 2") spark.sql("use db2") spark.sql("create table db2_table(i int, j int) stored as carbondata") spark.sql("insert into db2_table select 1, 4") spark.sql("use db3") spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i") spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)` if we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 edited a comment on pull request #3788: URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078 If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example. <br>`spark.sql("create database db1")` spark.sql("create database db2")` spark.sql("create database db3") spark.sql("use db1") spark.sql("create table db1_table(a int, b int) stored as carbondata") spark.sql("insert into db1_table select 1, 2") spark.sql("use db2") spark.sql("create table db2_table(i int, j int) stored as carbondata") spark.sql("insert into db2_table select 1, 4") spark.sql("use db3") spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i") spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)` if we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 edited a comment on pull request #3788: URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078 If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example. <br>`spark.sql("create database db1")` `<br>spark.sql("create database db2")` spark.sql("create database db3") spark.sql("use db1") spark.sql("create table db1_table(a int, b int) stored as carbondata") spark.sql("insert into db1_table select 1, 2") spark.sql("use db2") spark.sql("create table db2_table(i int, j int) stored as carbondata") spark.sql("insert into db2_table select 1, 4") spark.sql("use db3") spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i") spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)` if we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 edited a comment on pull request #3788: URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078 If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example. <br>`spark.sql("create database db1")`<br> `spark.sql("create database db2")` spark.sql("create database db3") spark.sql("use db1") spark.sql("create table db1_table(a int, b int) stored as carbondata") spark.sql("insert into db1_table select 1, 2") spark.sql("use db2") spark.sql("create table db2_table(i int, j int) stored as carbondata") spark.sql("insert into db2_table select 1, 4") spark.sql("use db3") spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i") spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)` if we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 edited a comment on pull request #3788: URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078 If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example. <br>`spark.sql("create database db1")` <br>`spark.sql("create database db2")` <br>`spark.sql("create database db3")` <br>`spark.sql("use db1")` <br>`spark.sql("create table db1_table(a int, b int) stored as carbondata")` <br>`spark.sql("insert into db1_table select 1, 2")` <br>`spark.sql("use db2")` <br>`spark.sql("create table db2_table(i int, j int) stored as carbondata")` <br>`spark.sql("insert into db2_table select 1, 4")` <br>`spark.sql("use` db3")` <br>`spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i")` <br>`spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)` if we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 edited a comment on pull request #3788: URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078 If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example. <br>`spark.sql("create database db1")` <br>`spark.sql("create database db2")` <br>`spark.sql("create database db3")` <br>`spark.sql("use db1")` <br>`spark.sql("create table db1_table(a int, b int) stored as carbondata")` <br>`spark.sql("insert into db1_table select 1, 2")` <br>`spark.sql("use db2")` <br>`spark.sql("create table db2_table(i int, j int) stored as carbondata")` <br>`spark.sql("insert into db2_table select 1, 4")` <br>`spark.sql("use db3")` <br>`spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i")` <br>`spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)` **if we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan.** ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 edited a comment on pull request #3788: URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078 If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example. <br>`spark.sql("create database db1")` <br>`spark.sql("create database db2")` <br>`spark.sql("create database db3")` <br>`spark.sql("use db1")` <br>`spark.sql("create table db1_table(a int, b int) stored as carbondata")` <br>`spark.sql("insert into db1_table select 1, 2")` <br>`spark.sql("use db2")` <br>`spark.sql("create table db2_table(i int, j int) stored as carbondata")` <br>`spark.sql("insert into db2_table select 1, 4")` <br>`spark.sql("use db3")` <br>`spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i")` <br>`spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)` If we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
vikramahuja1001 commented on pull request #3788: URL: https://github.com/apache/carbondata/pull/3788#issuecomment-697198143 @VenuReddy2103 , maybe such a test case can be added in the code ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |