Indhumathi27 opened a new pull request #3690: [WIP] Block creating materilaized view with duplicate column
URL: https://github.com/apache/carbondata/pull/3690 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
CarbonDataQA1 commented on issue #3690: [WIP] Block creating materilaized view with duplicate column
URL: https://github.com/apache/carbondata/pull/3690#issuecomment-607309184 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2608/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3690: [WIP] Block creating materilaized view with duplicate column
URL: https://github.com/apache/carbondata/pull/3690#issuecomment-607312723 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/900/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3690: [WIP] Block creating materilaized view with duplicate column
URL: https://github.com/apache/carbondata/pull/3690#issuecomment-607415544 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/905/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3690: [WIP] Block creating materilaized view with duplicate column
URL: https://github.com/apache/carbondata/pull/3690#issuecomment-607416194 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2614/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3690: [WIP] Block creating materilaized view with duplicate column
URL: https://github.com/apache/carbondata/pull/3690#issuecomment-607776800 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2619/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3690: [WIP] Block creating materilaized view with duplicate column
URL: https://github.com/apache/carbondata/pull/3690#issuecomment-607777345 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/910/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3690: [WIP] Block creating materilaized view with duplicate column
URL: https://github.com/apache/carbondata/pull/3690#issuecomment-608010488 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/914/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3690: [WIP] Block creating materilaized view with duplicate column
URL: https://github.com/apache/carbondata/pull/3690#issuecomment-608012817 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2623/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3690: [CARBONDATA-3762] Block creating Materialized view's with duplicate column
URL: https://github.com/apache/carbondata/pull/3690#discussion_r403123959 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/optimizer/MVMatcher.scala ########## @@ -1342,6 +1343,40 @@ private object SelectSelectGroupbyChildDelta } } + /** + * Removes duplicate projection in the output list for query matching + */ + def getDistinctOutputList(outputList: Seq[NamedExpression]): Seq[NamedExpression] = { + var distinctOList: Seq[NamedExpression] = Seq.empty + outputList.foreach { output => + if (distinctOList.isEmpty) { + distinctOList = distinctOList :+ output + } else { + // get output name + var outputName = output.name + if (output.isInstanceOf[Alias]) { + // In case of queries with join on more than one table and projection list having + // aggregation of same column name on join tables like sum(t1.column), sum(t2.column), + // in that case, compare alias name with column id, as alias name will be same for + // both output(sum(t1)) + val projectName = output.simpleString + outputName = projectName.substring(0, projectName.indexOf(" AS")) + } + if (!distinctOList.exists(oList => + if (oList.isInstanceOf[Alias]) { Review comment: `oList` rename to may be outPutColumn ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3690: [CARBONDATA-3762] Block creating Materialized view's with duplicate column
URL: https://github.com/apache/carbondata/pull/3690#discussion_r403125908 ########## File path: mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/TestAllOperationsOnMV.scala ########## @@ -626,6 +626,32 @@ class TestAllOperationsOnMV extends QueryTest with BeforeAndAfterEach { sql("drop table IF EXISTS maintable") } + test("test duplicate column name in mv") { Review comment: please can you add a couple of test cases for 1. one column with alias , one direct column name 2. both column with alias ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3690: [CARBONDATA-3762] Block creating Materialized view's with duplicate column
URL: https://github.com/apache/carbondata/pull/3690#discussion_r403127008 ########## File path: integration/spark/src/main/scala/org/apache/carbondata/view/MVHelper.scala ########## @@ -291,6 +307,23 @@ object MVHelper { fieldsMap } + private def findDuplicateColumns(fieldColumnsMap: util.HashMap[String, util.ArrayList[String]], Review comment: please handle the case of one column with direct name and one with alias. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3690: [CARBONDATA-3762] Block creating Materialized view's with duplicate column
URL: https://github.com/apache/carbondata/pull/3690#discussion_r403436053 ########## File path: mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/TestAllOperationsOnMV.scala ########## @@ -626,6 +626,32 @@ class TestAllOperationsOnMV extends QueryTest with BeforeAndAfterEach { sql("drop table IF EXISTS maintable") } + test("test duplicate column name in mv") { Review comment: added ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3690: [CARBONDATA-3762] Block creating Materialized view's with duplicate column
URL: https://github.com/apache/carbondata/pull/3690#discussion_r403436102 ########## File path: integration/spark/src/main/scala/org/apache/carbondata/view/MVHelper.scala ########## @@ -291,6 +307,23 @@ object MVHelper { fieldsMap } + private def findDuplicateColumns(fieldColumnsMap: util.HashMap[String, util.ArrayList[String]], Review comment: Handled ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3690: [CARBONDATA-3762] Block creating Materialized view's with duplicate column
URL: https://github.com/apache/carbondata/pull/3690#issuecomment-609002041 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/928/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3690: [CARBONDATA-3762] Block creating Materialized view's with duplicate column
URL: https://github.com/apache/carbondata/pull/3690#issuecomment-609003792 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2637/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on issue #3690: [CARBONDATA-3762] Block creating Materialized view's with duplicate column
URL: https://github.com/apache/carbondata/pull/3690#issuecomment-610174540 @akashrn5 Please review and merge ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3690: [CARBONDATA-3762] Block creating Materialized view's with duplicate column
URL: https://github.com/apache/carbondata/pull/3690#discussion_r404580379 ########## File path: mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/TestAllOperationsOnMV.scala ########## @@ -626,6 +626,49 @@ class TestAllOperationsOnMV extends QueryTest with BeforeAndAfterEach { sql("drop table IF EXISTS maintable") } + test("test duplicate column name in mv") { + sql("drop table IF EXISTS maintable") + sql("create table maintable(name string, c_code int, price int) STORED AS carbondata") + sql("insert into table maintable values('abc',21,2000),('mno',24,3000)") + sql("drop materialized view if exists mv1") + val res1 = sql("select name,sum(c_code) from maintable group by name") + val res2 = sql("select name, name,sum(c_code),sum(c_code) from maintable group by name") + val res3 = sql("select c_code,price from maintable") + sql("create materialized view mv1 as select name,sum(c_code) from maintable group by name") + val df1 = sql("select name,sum(c_code) from maintable group by name") + TestUtil.verifyMVDataMap(df1.queryExecution.optimizedPlan, "mv1") + checkAnswer(res1, df1) + val df2 = sql("select name, name,sum(c_code),sum(c_code) from maintable group by name") + TestUtil.verifyMVDataMap(df2.queryExecution.optimizedPlan, "mv1") + checkAnswer(df2, res2) + sql("drop materialized view if exists mv2") + sql("create materialized view mv2 as select c_code,price from maintable") + val df3 = sql("select c_code,price from maintable") + TestUtil.verifyMVDataMap(df3.queryExecution.optimizedPlan, "mv2") + checkAnswer(res3, df3) + val df4 = sql("select c_code,price,price,c_code from maintable") + TestUtil.verifyMVDataMap(df4.queryExecution.optimizedPlan, "mv2") + checkAnswer(df4, Seq(Row(21,2000,2000,21), Row(24,3000,3000,24))) + sql("drop table IF EXISTS maintable") + } + + test("test duplicate column with different alias name") { + sql("drop table IF EXISTS maintable") + sql("create table maintable(name string, c_code int, price int) STORED AS carbondata") + sql("insert into table maintable values('abc',21,2000),('mno',24,3000)") + sql("drop materialized view if exists mv1") + intercept[MalformedMVCommandException] { + sql("create materialized view mv1 as select name,sum(c_code),sum(c_code) as a from maintable group by name") + }.getMessage.contains("Cannot create mv having duplicate column with different alias name: sum(CAST(maintable.`c_code` AS BIGINT)) AS `a`") Review comment: can't we give simpler column name than `sum(CAST(maintable.`c_code` AS BIGINT)) AS `a`` is it possible? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3690: [CARBONDATA-3762] Block creating Materialized view's with duplicate column
URL: https://github.com/apache/carbondata/pull/3690#discussion_r404580379 ########## File path: mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/TestAllOperationsOnMV.scala ########## @@ -626,6 +626,49 @@ class TestAllOperationsOnMV extends QueryTest with BeforeAndAfterEach { sql("drop table IF EXISTS maintable") } + test("test duplicate column name in mv") { + sql("drop table IF EXISTS maintable") + sql("create table maintable(name string, c_code int, price int) STORED AS carbondata") + sql("insert into table maintable values('abc',21,2000),('mno',24,3000)") + sql("drop materialized view if exists mv1") + val res1 = sql("select name,sum(c_code) from maintable group by name") + val res2 = sql("select name, name,sum(c_code),sum(c_code) from maintable group by name") + val res3 = sql("select c_code,price from maintable") + sql("create materialized view mv1 as select name,sum(c_code) from maintable group by name") + val df1 = sql("select name,sum(c_code) from maintable group by name") + TestUtil.verifyMVDataMap(df1.queryExecution.optimizedPlan, "mv1") + checkAnswer(res1, df1) + val df2 = sql("select name, name,sum(c_code),sum(c_code) from maintable group by name") + TestUtil.verifyMVDataMap(df2.queryExecution.optimizedPlan, "mv1") + checkAnswer(df2, res2) + sql("drop materialized view if exists mv2") + sql("create materialized view mv2 as select c_code,price from maintable") + val df3 = sql("select c_code,price from maintable") + TestUtil.verifyMVDataMap(df3.queryExecution.optimizedPlan, "mv2") + checkAnswer(res3, df3) + val df4 = sql("select c_code,price,price,c_code from maintable") + TestUtil.verifyMVDataMap(df4.queryExecution.optimizedPlan, "mv2") + checkAnswer(df4, Seq(Row(21,2000,2000,21), Row(24,3000,3000,24))) + sql("drop table IF EXISTS maintable") + } + + test("test duplicate column with different alias name") { + sql("drop table IF EXISTS maintable") + sql("create table maintable(name string, c_code int, price int) STORED AS carbondata") + sql("insert into table maintable values('abc',21,2000),('mno',24,3000)") + sql("drop materialized view if exists mv1") + intercept[MalformedMVCommandException] { + sql("create materialized view mv1 as select name,sum(c_code),sum(c_code) as a from maintable group by name") + }.getMessage.contains("Cannot create mv having duplicate column with different alias name: sum(CAST(maintable.`c_code` AS BIGINT)) AS `a`") Review comment: can't we give simpler column name than `sum(CAST(maintable.`c_code` AS BIGINT)) AS `a`` is it possible? may be in case of big queries it will be difficult to find out i think. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3690: [CARBONDATA-3762] Block creating Materialized view's with duplicate column
URL: https://github.com/apache/carbondata/pull/3690#discussion_r404585118 ########## File path: mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/TestAllOperationsOnMV.scala ########## @@ -626,6 +626,49 @@ class TestAllOperationsOnMV extends QueryTest with BeforeAndAfterEach { sql("drop table IF EXISTS maintable") } + test("test duplicate column name in mv") { + sql("drop table IF EXISTS maintable") + sql("create table maintable(name string, c_code int, price int) STORED AS carbondata") + sql("insert into table maintable values('abc',21,2000),('mno',24,3000)") + sql("drop materialized view if exists mv1") + val res1 = sql("select name,sum(c_code) from maintable group by name") + val res2 = sql("select name, name,sum(c_code),sum(c_code) from maintable group by name") + val res3 = sql("select c_code,price from maintable") + sql("create materialized view mv1 as select name,sum(c_code) from maintable group by name") + val df1 = sql("select name,sum(c_code) from maintable group by name") + TestUtil.verifyMVDataMap(df1.queryExecution.optimizedPlan, "mv1") + checkAnswer(res1, df1) + val df2 = sql("select name, name,sum(c_code),sum(c_code) from maintable group by name") + TestUtil.verifyMVDataMap(df2.queryExecution.optimizedPlan, "mv1") + checkAnswer(df2, res2) + sql("drop materialized view if exists mv2") + sql("create materialized view mv2 as select c_code,price from maintable") + val df3 = sql("select c_code,price from maintable") + TestUtil.verifyMVDataMap(df3.queryExecution.optimizedPlan, "mv2") + checkAnswer(res3, df3) + val df4 = sql("select c_code,price,price,c_code from maintable") + TestUtil.verifyMVDataMap(df4.queryExecution.optimizedPlan, "mv2") + checkAnswer(df4, Seq(Row(21,2000,2000,21), Row(24,3000,3000,24))) + sql("drop table IF EXISTS maintable") + } + + test("test duplicate column with different alias name") { + sql("drop table IF EXISTS maintable") + sql("create table maintable(name string, c_code int, price int) STORED AS carbondata") + sql("insert into table maintable values('abc',21,2000),('mno',24,3000)") + sql("drop materialized view if exists mv1") + intercept[MalformedMVCommandException] { + sql("create materialized view mv1 as select name,sum(c_code),sum(c_code) as a from maintable group by name") + }.getMessage.contains("Cannot create mv having duplicate column with different alias name: sum(CAST(maintable.`c_code` AS BIGINT)) AS `a`") Review comment: I am directly doing (column).sql to display columns, because in case of join on more tables, it is required to display with qualifier. In case of above scenario, it will be difficult to get only column name. I think it is better to keep like this. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
Free forum by Nabble | Edit this page |