GitHub user praveenmeenakshi56 opened a pull request:
https://github.com/apache/carbondata/pull/2390 [CARBONDATA-2624] Added validations for complex dataType columns in create table command for Local Dictionary Support Added Validations for Complex DataType command in create table command. Added Unit Test cases for the same Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? will be updated in another PR - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. Unit Test cases tested and added in this PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/praveenmeenakshi56/carbondata local_dict1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2390.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2390 ---- commit ee3eeaf008ed14d32f10ab69f84ade4494bf522b Author: praveenmeenakshi56 <praveenmeenakshi56@...> Date: 2018-06-20T22:03:11Z Added validations for create table command with complex dataType columns for Local Dictionary Support ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2390 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6429/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2390 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5260/ --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:
https://github.com/apache/carbondata/pull/2390 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2390 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6432/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2390 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5263/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2390 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5366/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2390 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5367/ --- |
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2390#discussion_r197019547 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala --- @@ -400,6 +403,34 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { tableComment) } + /** + * this method validates whether any of the child columns of complex dataType column + * is of type string. + */ + def validateChildColumns(field: Field): Boolean = { + var stringColumnCount = 0 + def validateChildColumnsRecursively(field: Field): Unit = { + if (field.children.isDefined && null != field.children.get) { + field.children.get.foreach { childColumn => + if (childColumn.children.isDefined && null != childColumn.children.get) { + validateChildColumnsRecursively(childColumn) + } + else { + if (childColumn.dataType.get.equalsIgnoreCase("string")) { + stringColumnCount += 1 + } + } + } + } + } + --- End diff -- remove empty line here and add a proper comment --- |
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2390#discussion_r197019592 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala --- @@ -444,16 +476,30 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { throw new MalformedCarbonCommandException(errormsg) } } + + // Validate whether any of the child columns of complex dataType column is a string column + localDictColumns.foreach { dictColm => + if (fields + .exists(x => x.column.equalsIgnoreCase(dictColm) && x.children.isDefined && + null != x.children.get && + !validateChildColumns(x))) { + val errMsg = "No child column is string dataType column in local_dictionary_include." --- End diff -- error message is not correct, give a proper error message --- |
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2390#discussion_r197020746 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala --- @@ -444,16 +476,30 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { throw new MalformedCarbonCommandException(errormsg) } } + + // Validate whether any of the child columns of complex dataType column is a string column + localDictColumns.foreach { dictColm => + if (fields + .exists(x => x.column.equalsIgnoreCase(dictColm) && x.children.isDefined && + null != x.children.get && + !validateChildColumns(x))) { + val errMsg = "No child column is string dataType column in local_dictionary_include." + throw new MalformedCarbonCommandException(errMsg) + } + } + // check if the same column is present in both dictionary include and local dictionary columns // configuration if (tableProperties.get(CarbonCommonConstants.DICTIONARY_INCLUDE).isDefined) { + val dictExcludeCoumns = tableProperties.get(CarbonCommonConstants.LOCAL_DICTIONARY_EXCLUDE) --- End diff -- remove this check, because if same column is contained in dictioanry include, local dictionary include, local dictionary exclude, any two will be validated and error will be thrown, better not to check all the three properties --- |
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2390#discussion_r197019082 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/localdictionary/LocalDictionarySupportCreateTableTest.scala --- @@ -723,16 +721,19 @@ class LocalDictionarySupportCreateTableTest extends QueryTest with BeforeAndAfte "is configured _005") { sql("drop table if exists local1") - intercept[MalformedCarbonCommandException] { + val exception = intercept[MalformedCarbonCommandException] { sql( """ | CREATE TABLE local1(id int, name string, city string, age int) | STORED BY 'org.apache.carbondata.format' | tblproperties('local_dictionary_enable'='true','local_dictionary_include'='name,city', - | 'local_dictionary_exclude'='name') + | 'local_dictionary_exclude'='city') --- End diff -- better to change exiting test case only for case sensitive check --- |
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2390#discussion_r197019432 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/localdictionary/LocalDictionarySupportCreateTableTest.scala --- @@ -2096,6 +2093,175 @@ class LocalDictionarySupportCreateTableTest extends QueryTest with BeforeAndAfte } } + test( + "test local dictionary custom configurations when complex dataType columns are given in " + + "local_dictionary_include _001") + { + sql("drop table if exists local1") + val exception = intercept[MalformedCarbonCommandException] { + sql( + """ + | CREATE TABLE local1(id int, name string,city array<int>, st array<struct<i:int,s:int>>) + | STORED BY 'org.apache.carbondata.format' + | tblproperties('local_dictionary_enable'='true','local_dictionary_include'='name,st') + """.stripMargin) + } + assert(exception.getMessage + .contains( + "No child column is string dataType column in local_dictionary_include.")) + } + + test( + "test local dictionary custom configurations when complex dataType columns are given in " + + "local_dictionary_include _002") + { + sql("drop table if exists local1") + val exception = intercept[MalformedCarbonCommandException] { + sql( + """ + | CREATE TABLE local1(id int, name string,city array<int>, st string) + | STORED BY 'org.apache.carbondata.format' + | tblproperties('local_dictionary_enable'='true','local_dictionary_include'='name,st,city') + """.stripMargin) + } + assert(exception.getMessage + .contains( + "No child column is string dataType column in local_dictionary_include.")) + } + + test( + "test local dictionary custom configurations when complex dataType columns are given in " + + "local_dictionary_include _003") + { + sql("drop table if exists local1") + val exception = intercept[MalformedCarbonCommandException] { + sql( + """ + | CREATE TABLE local1(id int, name string,city array<int>, st struct<i:int,s:int>) + | STORED BY 'org.apache.carbondata.format' + | tblproperties('local_dictionary_enable'='true','local_dictionary_include'='name,st') + """.stripMargin) + } + assert(exception.getMessage + .contains( + "No child column is string dataType column in local_dictionary_include.")) + } + + test( + "test local dictionary custom configurations when complex dataType columns are given in " + + "local_dictionary_include _004") + { + sql("drop table if exists local1") + sql( + """ + | CREATE TABLE local1(id int, name string,city array<int>, st struct<i:int,s:string>) + | STORED BY 'org.apache.carbondata.format' + | tblproperties('local_dictionary_enable'='true','local_dictionary_include'='name,st') + """.stripMargin) + val descLoc = sql("describe formatted local1").collect + descLoc.find(_.get(0).toString.contains("Local Dictionary Threshold")) match { + case Some(row) => assert(row.get(1).toString.contains("1000")) + } + descLoc.find(_.get(0).toString.contains("Local Dictionary Enabled")) match { + case Some(row) => assert(row.get(1).toString.contains("true")) + } + descLoc.find(_.get(0).toString.contains("Local Dictionary Include")) match { + case Some(row) => assert(row.get(1).toString.contains("name,st")) + } + } + + test( + "test local dictionary custom configurations when complex dataType columns are given in " + + "local_dictionary_include _005") + { + sql("drop table if exists local1") + sql( + """ + | CREATE TABLE local1(id int, name string,city array<string>, st struct<i:int,s:int>) + | STORED BY 'org.apache.carbondata.format' + | tblproperties('local_dictionary_enable'='true','local_dictionary_include'='name,city') + """.stripMargin) + val descLoc = sql("describe formatted local1").collect + descLoc.find(_.get(0).toString.contains("Local Dictionary Threshold")) match { + case Some(row) => assert(row.get(1).toString.contains("1000")) + } + descLoc.find(_.get(0).toString.contains("Local Dictionary Enabled")) match { + case Some(row) => assert(row.get(1).toString.contains("true")) + } + descLoc.find(_.get(0).toString.contains("Local Dictionary Include")) match { + case Some(row) => assert(row.get(1).toString.contains("name,city")) + } + } + + test( + "test local dictionary custom configurations when complex dataType columns are given in " + + "local_dictionary_include _006") + { + sql("drop table if exists local1") + sql( + """ + | CREATE TABLE local1(id int, name string,city array<int>, st struct<i:int,s:array<string>>) + | STORED BY 'org.apache.carbondata.format' + | tblproperties('local_dictionary_enable'='true','local_dictionary_include'='name,st') + """.stripMargin) + val descLoc = sql("describe formatted local1").collect + descLoc.find(_.get(0).toString.contains("Local Dictionary Threshold")) match { + case Some(row) => assert(row.get(1).toString.contains("1000")) + } + descLoc.find(_.get(0).toString.contains("Local Dictionary Enabled")) match { + case Some(row) => assert(row.get(1).toString.contains("true")) + } + descLoc.find(_.get(0).toString.contains("Local Dictionary Include")) match { + case Some(row) => assert(row.get(1).toString.contains("name,st")) + } + } + + test( + "test local dictionary custom configurations when complex dataType columns are given in " + + "local_dictionary_include _007") + { + sql("drop table if exists local1") + sql( + """ + | CREATE TABLE local1(id int, name string,city array<int>, st struct<i:int,s:struct<si:string>>) + | STORED BY 'org.apache.carbondata.format' + | tblproperties('local_dictionary_enable'='true','local_dictionary_include'='name,st') + """.stripMargin) + val descLoc = sql("describe formatted local1").collect + descLoc.find(_.get(0).toString.contains("Local Dictionary Threshold")) match { + case Some(row) => assert(row.get(1).toString.contains("1000")) + } + descLoc.find(_.get(0).toString.contains("Local Dictionary Enabled")) match { + case Some(row) => assert(row.get(1).toString.contains("true")) + } + descLoc.find(_.get(0).toString.contains("Local Dictionary Include")) match { + case Some(row) => assert(row.get(1).toString.contains("name,st")) + } + } + + test( + "test local dictionary custom configurations when complex dataType columns are given in " + + "local_dictionary_include _008") + { + sql("drop table if exists local1") + sql( + """ + | CREATE TABLE local1(id int, name string,city array<int>, st array<struct<si:string>>) + | STORED BY 'org.apache.carbondata.format' + | tblproperties('local_dictionary_enable'='true','local_dictionary_include'='name,st') + """.stripMargin) + val descLoc = sql("describe formatted local1").collect + descLoc.find(_.get(0).toString.contains("Local Dictionary Threshold")) match { + case Some(row) => assert(row.get(1).toString.contains("1000")) + } + descLoc.find(_.get(0).toString.contains("Local Dictionary Enabled")) match { + case Some(row) => assert(row.get(1).toString.contains("true")) + } + descLoc.find(_.get(0).toString.contains("Local Dictionary Include")) match { + case Some(row) => assert(row.get(1).toString.contains("name,st")) + } + } + --- End diff -- add one more test case, where two complex columns are there and one is only with int datatype child columns --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2390 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6435/ --- |
In reply to this post by qiuchenjian-2
Github user praveenmeenakshi56 commented on the issue:
https://github.com/apache/carbondata/pull/2390 retest this please --- |
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2390#discussion_r197027343 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala --- @@ -400,6 +403,39 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { tableComment) } + /** + * this method validates whether any of the child columns of complex dataType column + * is of type string. + */ + def validateChildColumns(field: Field): Boolean = { --- End diff -- can you change the logic to return true as soon as you find any column with string datatype. No need to check other columns. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2390 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5266/ --- |
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2390#discussion_r197028165 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/localdictionary/LocalDictionarySupportCreateTableTest.scala --- @@ -2096,6 +2093,194 @@ class LocalDictionarySupportCreateTableTest extends QueryTest with BeforeAndAfte } } + test( + "test local dictionary custom configurations when complex dataType columns are given in " + + "local_dictionary_include _001") + { + sql("drop table if exists local1") + val exception = intercept[MalformedCarbonCommandException] { + sql( + """ + | CREATE TABLE local1(id int, name string,city array<int>, st array<struct<i:int,s:int>>) + | STORED BY 'org.apache.carbondata.format' + | tblproperties('local_dictionary_enable'='true','local_dictionary_include'='name,st') + """.stripMargin) + } + assert(exception.getMessage + .contains( + "Child columns do not exist for the specified complex dataType column(s) in local_dictionary_include.")) + } + + test( + "test local dictionary custom configurations when complex dataType columns are given in " + + "local_dictionary_include _002") + { + sql("drop table if exists local1") + val exception = intercept[MalformedCarbonCommandException] { + sql( + """ + | CREATE TABLE local1(id int, name string,city array<int>, st string) + | STORED BY 'org.apache.carbondata.format' + | tblproperties('local_dictionary_enable'='true','local_dictionary_include'='name,st,city') + """.stripMargin) + } + assert(exception.getMessage + .contains( + "Child columns do not exist for the specified complex dataType column(s) in local_dictionary_include.")) + } + + test( --- End diff -- i think local_dictionary_include _001 has already covered this scenario. Please remove all duplicate test cases. --- |
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2390#discussion_r197029663 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala --- @@ -444,16 +481,29 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { throw new MalformedCarbonCommandException(errormsg) } } + + // Validate whether any of the child columns of complex dataType column is a string column + localDictColumns.foreach { dictColm => + if (fields + .exists(x => x.column.equalsIgnoreCase(dictColm) && x.children.isDefined && + null != x.children.get && + !validateChildColumns(x))) { + val errMsg = "Child columns do not exist for the specified complex dataType column(s) in local_dictionary_include." --- End diff -- Throw proper error message. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2390 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6439/ --- |
Free forum by Nabble | Edit this page |