Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] [carbondata] xiaohui0318 opened a new pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

Classic

List

16 messages Options

Options

GitBox

[GitHub] [carbondata] xiaohui0318 opened a new pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

xiaohui0318 opened a new pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788

### Why is this PR needed?
When a query is executed, all databases of metadata are scanned, and if the number of databases is very large, the execution time increases for a long time each time, reducing query performance

### What changes were proposed in this PR?
scan the relevant database instead of scanning all

### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)

### Is any new testcase added?
- No
- Yes

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

CarbonDataQA1 commented on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-639250055

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3136/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-639250322

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1412/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-639378003

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1414/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-639378978

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3138/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] niuge01 commented on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

In reply to this post by GitBox

niuge01 commented on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-645329423

May be create an mv in database A as select from a table in database B, when query the table, we should check all mv on this table, not only in database B.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] xiaohui0318 commented on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

In reply to this post by GitBox

xiaohui0318 commented on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-645348038

> May be create an mv in database A as select from a table in database B, when query the table, we should check all mv on this table, not only in database B.

We have a cluster with more than 1,500 databases, and it takes more than 20 seconds to scan each select statement, no matter how simple the SQL. I wonder it should be placed in a specified database?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

In reply to this post by GitBox

VenuReddy2103 commented on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078

If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example.
`spark.sql("create database db1")
spark.sql("create database db2")
spark.sql("create database db3")
spark.sql("use db1")
spark.sql("create table db1_table(a int, b int) stored as carbondata")
spark.sql("insert into db1_table select 1, 2")
spark.sql("use db2")
spark.sql("create table db2_table(i int, j int) stored as carbondata")
spark.sql("insert into db2_table select 1, 4")
spark.sql("use db3")
spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i")
spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)`

if we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 edited a comment on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

In reply to this post by GitBox

VenuReddy2103 edited a comment on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078

If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example.
<br>`spark.sql("create database db1")`
<br>`spark.sql("create database db2")`
spark.sql("create database db3")
spark.sql("use db1")
spark.sql("create table db1_table(a int, b int) stored as carbondata")
spark.sql("insert into db1_table select 1, 2")
spark.sql("use db2")
spark.sql("create table db2_table(i int, j int) stored as carbondata")
spark.sql("insert into db2_table select 1, 4")
spark.sql("use db3")
spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i")
spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)`

if we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 edited a comment on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

In reply to this post by GitBox

VenuReddy2103 edited a comment on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078

If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example.
<br>`spark.sql("create database db1")`
spark.sql("create database db2")`
spark.sql("create database db3")
spark.sql("use db1")
spark.sql("create table db1_table(a int, b int) stored as carbondata")
spark.sql("insert into db1_table select 1, 2")
spark.sql("use db2")
spark.sql("create table db2_table(i int, j int) stored as carbondata")
spark.sql("insert into db2_table select 1, 4")
spark.sql("use db3")
spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i")
spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)`

if we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 edited a comment on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

In reply to this post by GitBox

VenuReddy2103 edited a comment on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078

If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example.
<br>`spark.sql("create database db1")`
`<br>spark.sql("create database db2")`
spark.sql("create database db3")
spark.sql("use db1")
spark.sql("create table db1_table(a int, b int) stored as carbondata")
spark.sql("insert into db1_table select 1, 2")
spark.sql("use db2")
spark.sql("create table db2_table(i int, j int) stored as carbondata")
spark.sql("insert into db2_table select 1, 4")
spark.sql("use db3")
spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i")
spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)`

if we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 edited a comment on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

In reply to this post by GitBox

VenuReddy2103 edited a comment on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078

If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example.
<br>`spark.sql("create database db1")`<br>
`spark.sql("create database db2")`
spark.sql("create database db3")
spark.sql("use db1")
spark.sql("create table db1_table(a int, b int) stored as carbondata")
spark.sql("insert into db1_table select 1, 2")
spark.sql("use db2")
spark.sql("create table db2_table(i int, j int) stored as carbondata")
spark.sql("insert into db2_table select 1, 4")
spark.sql("use db3")
spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i")
spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)`

if we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 edited a comment on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

In reply to this post by GitBox

VenuReddy2103 edited a comment on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078

If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example.
<br>`spark.sql("create database db1")`
<br>`spark.sql("create database db2")`
<br>`spark.sql("create database db3")`
<br>`spark.sql("use db1")`
<br>`spark.sql("create table db1_table(a int, b int) stored as carbondata")`
<br>`spark.sql("insert into db1_table select 1, 2")`
<br>`spark.sql("use db2")`
<br>`spark.sql("create table db2_table(i int, j int) stored as carbondata")`
<br>`spark.sql("insert into db2_table select 1, 4")`
<br>`spark.sql("use` db3")`
<br>`spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i")`
<br>`spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)`

if we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 edited a comment on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

In reply to this post by GitBox

VenuReddy2103 edited a comment on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078

If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example.
<br>`spark.sql("create database db1")`
<br>`spark.sql("create database db2")`
<br>`spark.sql("create database db3")`
<br>`spark.sql("use db1")`
<br>`spark.sql("create table db1_table(a int, b int) stored as carbondata")`
<br>`spark.sql("insert into db1_table select 1, 2")`
<br>`spark.sql("use db2")`
<br>`spark.sql("create table db2_table(i int, j int) stored as carbondata")`
<br>`spark.sql("insert into db2_table select 1, 4")`
<br>`spark.sql("use db3")`
<br>`spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i")`
<br>`spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)`

**if we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan.**

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 edited a comment on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

In reply to this post by GitBox

VenuReddy2103 edited a comment on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-662573078

If MV is in different database than that of its source tables, then we will not be able to rewrite the plan. Consider below inner join example.
<br>`spark.sql("create database db1")`
<br>`spark.sql("create database db2")`
<br>`spark.sql("create database db3")`
<br>`spark.sql("use db1")`
<br>`spark.sql("create table db1_table(a int, b int) stored as carbondata")`
<br>`spark.sql("insert into db1_table select 1, 2")`
<br>`spark.sql("use db2")`
<br>`spark.sql("create table db2_table(i int, j int) stored as carbondata")`
<br>`spark.sql("insert into db2_table select 1, 4")`
<br>`spark.sql("use db3")`
<br>`spark.sql("create materialized view db3_mv as select t1.a,t2.i from db1.db1_table t1,db2.db2_table t2 where t1.a=t2.i")`
<br>`spark.sql("explain select t1.a, t2.i from db1.db1_table t1, db2.db2_table t2 where t1.a=t2.i").show(100, false)`

If we get `getValidSchemas` only from `db1` and `db2` in `hasSuitableMV()`. We don't find one and do not rewrite the plan.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] vikramahuja1001 commented on pull request #3788: [CARBONDATA-3844]Fix scan the relevant database instead of scanning all

In reply to this post by GitBox

vikramahuja1001 commented on pull request #3788:
URL: https://github.com/apache/carbondata/pull/3788#issuecomment-697198143

@VenuReddy2103 , maybe such a test case can be added in the code

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]