Indhumathi27 commented on pull request #4115: URL: https://github.com/apache/carbondata/pull/4115#issuecomment-816453149 @akkio-97 For alter add columns, currently we support a property to provide default value in alter add ddl, like "DEFAULT.VALUE.column_name=somevalue". Please check and handle this scenario also. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akkio-97 commented on a change in pull request #4115: URL: https://github.com/apache/carbondata/pull/4115#discussion_r610393520 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala ########## @@ -275,6 +275,56 @@ class AlterTableColumnSchemaGenerator( allColumns ++= Seq(columnSchema) } newCols ++= Seq(columnSchema) + if(field.dataType== Some("Array") ) { + columnSchema.setNumberOfChild(field.children.size) + val childField = field.children.get(0) + val childSchema: ColumnSchema = TableNewProcessor.createColumnSchema( + childField, Review comment: Right, thrift is updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4115: URL: https://github.com/apache/carbondata/pull/4115#issuecomment-817556023 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5148/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4115: URL: https://github.com/apache/carbondata/pull/4115#issuecomment-817557252 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3397/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4115: URL: https://github.com/apache/carbondata/pull/4115#issuecomment-817701999 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3400/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4115: URL: https://github.com/apache/carbondata/pull/4115#issuecomment-817705872 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5152/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #4115: URL: https://github.com/apache/carbondata/pull/4115#discussion_r613021868 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala ########## @@ -275,6 +275,56 @@ class AlterTableColumnSchemaGenerator( allColumns ++= Seq(columnSchema) } newCols ++= Seq(columnSchema) + if(field.dataType== Some("Array") ) { + columnSchema.setNumberOfChild(field.children.size) + val childField = field.children.get(0) + val childSchema: ColumnSchema = TableNewProcessor.createColumnSchema( + childField, + new java.util.ArrayList[Encoding](), + isDimensionCol = true, + childField.precision, + childField.scale, + childField.schemaOrdinal + currentSchemaOrdinal, + alterTableModel.highCardinalityDims, + alterTableModel.databaseName.getOrElse(dbName), + isSortColumn(childField.name.getOrElse(childField.column)), + isVarcharColumn(childField.name.getOrElse(childField.column))) + + if (childSchema.getDataType == DataTypes.VARCHAR) { + // put the new long string columns in 'longStringCols' + // and add them after old long string columns + longStringCols ++= Seq(childSchema) + } else { + allColumns ++= Seq(childSchema) + } + newCols ++= Seq(childSchema) + } else if (field.dataType == Some("Struct")) { Review comment: same as above ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala ########## @@ -275,6 +275,56 @@ class AlterTableColumnSchemaGenerator( allColumns ++= Seq(columnSchema) } newCols ++= Seq(columnSchema) + if(field.dataType== Some("Array") ) { Review comment: instead of hardcoding, please use existing APIs like `org.apache.carbondata.core.metadata.datatype.DataTypes#isArrayType` ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableAddColumnCommand.scala ########## @@ -85,6 +85,8 @@ private[sql] case class CarbonAlterTableAddColumnCommand( timeStamp = System.currentTimeMillis val schemaEvolutionEntry = new org.apache.carbondata.core.metadata.schema.SchemaEvolutionEntry schemaEvolutionEntry.setTimeStamp(timeStamp) + // filter out complex children columns + newCols = newCols.filter(x => !x.isComplexColumn) schemaEvolutionEntry.setAdded(newCols.toList.asJava) Review comment: can we add in Schema Evolution entry, when you add columns schema as added, if you are just adding parent make sure, that column schema as its all child columns, just confirm if its there already, else please handle it. ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala ########## @@ -534,10 +581,10 @@ object TableNewProcessor { if (highCardinalityDims.contains(colName)) { encoders.remove(Encoding.DICTIONARY) } - if (dataType == DataTypes.DATE) { - encoders.add(Encoding.DICTIONARY) - encoders.add(Encoding.DIRECT_DICTIONARY) - } else if (dataType == DataTypes.TIMESTAMP && !highCardinalityDims.contains(colName)) { + // complex children does not require encoding + if (!colName.contains('.') && (dataType == DataTypes.DATE || Review comment: use static constant for point ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParserUtil.scala ########## @@ -683,12 +683,6 @@ object CarbonSparkSqlParserUtil { fields: List[Field], tblProp: Option[List[(String, String)]] ): CarbonAlterTableAddColumnCommand = { - fields.foreach { f => Review comment: i think since you are supporting only for single level complex column, if user tries to add multi level you should throw exception, and wherever you handle this, please add a TODO and mention jira ID which you have already created to support multilevel ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/alterTable/TestAlterTableAddColumns.scala ########## @@ -136,6 +140,209 @@ class TestAlterTableAddColumns extends QueryTest with BeforeAndAfterAll { sql(s"""DROP TABLE IF EXISTS ${ tableName }""") } + test("Test adding of array of all primitive datatypes") { + sql("DROP TABLE IF EXISTS alter_com") + sql("CREATE TABLE alter_com(intfield int) STORED AS carbondata") + sql( + "ALTER TABLE alter_com ADD COLUMNS(arr1 array<short>, arr2 array<int>, arr3 " + + "array<long>, arr4 array<double>, arr5 array<decimal(8,2)>, arr6 array<string>, arr7 " + + "array<char(5)>, arr8 array<varchar(50)>, arr9 array<boolean>, arr10 array<date>, arr11 " + + "array<timestamp> )") + sql("desc table alter_com").show(false) + sql( + "insert into alter_com values(1,array(1,5),array(1,2),array(1,2,3),array(1.2d,2.3d),array" + + "(4.5,6.7),array('hello','world'),array('a','bcd'),array('abcd','efg'),array(true,false)," + + "array('2017-02-01','2018-09-11'),array('2017-02-01 00:01:00','2018-02-01 02:21:00') )") + checkAnswer(sql( + "select * from alter_com"), + Seq(Row(1, make(Array(1, 5)), + make(Array(1, 2)), + make(Array(1, 2, 3)), + make(Array(1.2, 2.3)), + make(Array(java.math.BigDecimal.valueOf(4.5).setScale(2), + java.math.BigDecimal.valueOf(6.7).setScale(2))), + make(Array("hello", "world")), + make(Array("a", "bcd")), + make(Array("abcd", "efg")), + make(Array(true, false)), + make(Array(Date.valueOf("2017-02-01"), Date.valueOf("2018-09-11"))), + make(Array(Timestamp.valueOf("2017-02-01 00:01:00"), + Timestamp.valueOf("2018-02-01 02:21:00"))) + ))) + sql("DROP TABLE IF EXISTS alter_com") + } + + test("Test adding of struct of all primitive datatypes") { + sql("DROP TABLE IF EXISTS alter_com") + sql( + "CREATE TABLE alter_com(intField INT, structField struct<a:short,b:int,c:long,d:double," + + "e:decimal(8,2),f:string,g:char(5),h:varchar(50),i:boolean,j:date,k:timestamp>) STORED AS " + + "carbondata") + sql( + "insert into alter_com values(1, named_struct('a',1,'b',2,'c',3,'d',1.23,'e',2.34,'f'," + + "'hello','g','abc','h','def','i',true,'j','2017-02-01','k','2018-02-01 02:00:00.0') ) ") + sql("select * from alter_com").show(false) + sql("DROP TABLE IF EXISTS alter_com") + } + + def insertIntoTableForArrayType(): Unit = { + sql("insert into alter_com values(4,array(2),array(1,2,3,4),array('abc','def'))") + sql("insert into alter_com values(5,array(1,2),array(1), array('Hulk','Thor'))") + sql( + "insert into alter_com values(6,array(1,2,3),array(1234,8,33333),array('Iron','Man'," + + "'Jarvis'))") + } + + def checkRestulForArrayType(): Unit = { + val totalRows = sql("select * from alter_com").collect() + val a = sql("select * from alter_com where array_contains(arr2,2)").collect + val b = sql("select * from alter_com where array_contains(arr3,'Thor')").collect + val c = sql("select * from alter_com where intField = 1").collect + assert(totalRows.size == 6) + assert(a.size == 1) + assert(b.size == 1) + // check default value for newly added array column that is index - 3 and 4 + assert(c(0)(2) == null && c(0)(3) == null) + } + + test("Test alter add for arrays enabling local dictionary") { + sql("DROP TABLE IF EXISTS alter_com") + sql( + "CREATE TABLE alter_com(intField INT, arr1 array<int>) " + + "STORED AS carbondata TBLPROPERTIES('local_dictionary_enable'='true')") + + sql("insert into alter_com values(1,array(1) )") + sql("insert into alter_com values(2,array(9,0) )") + sql("insert into alter_com values(3,array(11,12,13) )") + + sql( + "ALTER TABLE alter_com ADD COLUMNS(arr2 array<int>, arr3 array<string>) TBLPROPERTIES" + + "('LOCAL_DICTIONARY_INCLUDE'='arr3')") + val schema = sql("describe alter_com").collect() + assert(schema.size == 4) + + // For the previous segments the default value for newly added array column is null + sql("insert into alter_com values(4,array(2),array(1,2,3,4),array('abc','def'))") + sql("insert into alter_com values(5,array(1,2),array(1), array('Hulk','Thor'))") + sql( + "insert into alter_com values(6,array(1,2,3),array(1234,8,33333),array('Iron','Man'," + + "'Jarvis'))") + + val totalRows = sql("select * from alter_com").collect() + val a = sql("select * from alter_com where array_contains(arr2,2)").collect + val b = sql("select * from alter_com where array_contains(arr3,'Thor')").collect + val c = sql("select * from alter_com where intField = 1").collect + assert(totalRows.size == 6) + assert(a.size == 1) + assert(b.size == 1) + // check default value for newly added array column that is index - 3 and 4 + assert(c(0)(2) == null && c(0)(3) == null) + sql("DROP TABLE IF EXISTS alter_com") + + } + + test("Test alter add for arrays disabling local dictionary") { + sql("DROP TABLE IF EXISTS alter_com") + sql( + "CREATE TABLE alter_com(intField INT, arr1 array<int>) " + + "STORED AS carbondata TBLPROPERTIES('local_dictionary_enable'='false')") + + sql("insert into alter_com values(1,array(1) )") + sql("insert into alter_com values(2,array(9,0) )") + sql("insert into alter_com values(3,array(11,12,13) )") + sql( + "ALTER TABLE alter_com ADD COLUMNS(arr2 array<int>, arr3 array<string>) TBLPROPERTIES" + + "('LOCAL_DICTIONARY_EXCLUDE'='arr3')") + val schema = sql("describe alter_com").collect() + assert(schema.size == 4) + + // For the previous segments the default value for newly added array column is null + sql("insert into alter_com values(4,array(2),array(1,2,3,4),array('abc','def'))") + sql("insert into alter_com values(5,array(1,2),array(1), array('Hulk','Thor'))") + sql( + "insert into alter_com values(6,array(1,2,3),array(1234,80,3333),array('Iron','Man'," + + "'Jarvis'))") + + val totalRows = sql("select * from alter_com").collect() + val a = sql("select * from alter_com where array_contains(arr2,2)").collect + val b = sql("select * from alter_com where array_contains(arr3,'Thor')").collect + val c = sql("select * from alter_com where intField = 1").collect + assert(totalRows.size == 6) + assert(a.size == 1) + assert(b.size == 1) + // check default value for newly added array column that is index - 3 and 4 + assert(c(0)(2) == null && c(0)(3) == null) + sql("DROP TABLE IF EXISTS alter_com") + + } + + def insertIntoTableForStructType(): Unit = { + sql( + "insert into alter_struct values(2, named_struct('id1', 'id2','name1','name2'), " + + "named_struct('a','id2','b', 'abc2'), 'hello world', 5, named_struct('c','id3'," + + "'d', 'abc3','e', 22), array(1,2,3) )") + sql( + "insert into alter_struct values(3, named_struct('id1', 'id3','name1','name3'), " + + "named_struct('a','id2.1','b', 'abc2.1'), 'India', 5, named_struct('c','id3.1'," + + "'d', 'abc3.1','e', 25), array(4,5) )") + + } + + def checkResultForStructType(): Unit = { + val totalRows = sql("select * from alter_struct").collect() + val a = sql("select * from alter_struct where struct1.a = 'id2' ").collect() + val b = sql( + "select * from alter_struct where struct1.a = 'id2' or struct2.c = 'id3.1' or " + + "array_contains(arr,5) ").collect() + val c = sql("select * from alter_struct where roll = 1").collect() + + assert(totalRows.size == 3) + assert(a.size == 1) + assert(b.size == 2) + // check default value for newly added struct column + assert(c(0)(2) == null) + } + + test("Test alter add for structs enabling local dictionary") { + sql("DROP TABLE IF EXISTS alter_struct") + sql( + "create table alter_struct(roll int, department struct<id1:string,name1:string>) STORED AS " + + "carbondata TBLPROPERTIES('local_dictionary_enable'='true')") + sql( + "insert into alter_struct values(1, named_struct('id1', 'id1','name1','name1'))") + sql( + "ALTER TABLE alter_struct ADD COLUMNS(struct1 struct<a:string,b:string>, temp string," + + " intField int, struct2 struct<c:string,d:string,e:int>, arr array<int>) TBLPROPERTIES " + + "('LOCAL_DICTIONARY_INCLUDE'='struct1, struct2')") + val schema = sql("describe alter_struct").collect() + assert(schema.size == 7) + // For the previous segments the default value for newly added struct column is null + insertIntoTableForStructType + checkResultForStructType + sql("DROP TABLE IF EXISTS alter_struct") + + } + + test("Test alter add for structs, disabling local dictionary") { + sql("DROP TABLE IF EXISTS alter_struct") + sql( + "create table alter_struct(roll int, department struct<id1:string,name1:string>) STORED AS " + + "carbondata TBLPROPERTIES('local_dictionary_enable'='false')") + sql( + "insert into alter_struct values(1, named_struct('id1', 'id1','name1','name1'))") + sql( + "ALTER TABLE alter_struct ADD COLUMNS(struct1 struct<a:string,b:string>, temp string," + + " intField int, struct2 struct<c:string,d:string,e:int>, arr array<int>) TBLPROPERTIES " + + "('LOCAL_DICTIONARY_EXCLUDE'='struct1, struct2')") + val schema = sql("describe alter_struct").collect() + assert(schema.size == 7) + // For the previous segments the default value for newly added struct column is null + insertIntoTableForStructType + checkResultForStructType + sql("DROP TABLE IF EXISTS alter_struct") + + } + Review comment: also add a test case to validate adding of multi level complex column -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #4115: URL: https://github.com/apache/carbondata/pull/4115#discussion_r613037958 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala ########## @@ -275,6 +275,56 @@ class AlterTableColumnSchemaGenerator( allColumns ++= Seq(columnSchema) } newCols ++= Seq(columnSchema) + if(field.dataType== Some("Array") ) { Review comment: also please correct the code style, please use Ctrl + Alt + Shift + L in all the places where u have changed and use carbondata style config -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akkio-97 commented on a change in pull request #4115: URL: https://github.com/apache/carbondata/pull/4115#discussion_r613397041 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableAddColumnCommand.scala ########## @@ -85,6 +85,8 @@ private[sql] case class CarbonAlterTableAddColumnCommand( timeStamp = System.currentTimeMillis val schemaEvolutionEntry = new org.apache.carbondata.core.metadata.schema.SchemaEvolutionEntry schemaEvolutionEntry.setTimeStamp(timeStamp) + // filter out complex children columns + newCols = newCols.filter(x => !x.isComplexColumn) schemaEvolutionEntry.setAdded(newCols.toList.asJava) Review comment: We only maintain number of children in each parent schema. Each column schema is independent, hence it is a flat structure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akkio-97 commented on a change in pull request #4115: URL: https://github.com/apache/carbondata/pull/4115#discussion_r613397286 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParserUtil.scala ########## @@ -683,12 +683,6 @@ object CarbonSparkSqlParserUtil { fields: List[Field], tblProp: Option[List[(String, String)]] ): CarbonAlterTableAddColumnCommand = { - fields.foreach { f => Review comment: done ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/alterTable/TestAlterTableAddColumns.scala ########## @@ -136,6 +140,209 @@ class TestAlterTableAddColumns extends QueryTest with BeforeAndAfterAll { sql(s"""DROP TABLE IF EXISTS ${ tableName }""") } + test("Test adding of array of all primitive datatypes") { + sql("DROP TABLE IF EXISTS alter_com") + sql("CREATE TABLE alter_com(intfield int) STORED AS carbondata") + sql( + "ALTER TABLE alter_com ADD COLUMNS(arr1 array<short>, arr2 array<int>, arr3 " + + "array<long>, arr4 array<double>, arr5 array<decimal(8,2)>, arr6 array<string>, arr7 " + + "array<char(5)>, arr8 array<varchar(50)>, arr9 array<boolean>, arr10 array<date>, arr11 " + + "array<timestamp> )") + sql("desc table alter_com").show(false) + sql( + "insert into alter_com values(1,array(1,5),array(1,2),array(1,2,3),array(1.2d,2.3d),array" + + "(4.5,6.7),array('hello','world'),array('a','bcd'),array('abcd','efg'),array(true,false)," + + "array('2017-02-01','2018-09-11'),array('2017-02-01 00:01:00','2018-02-01 02:21:00') )") + checkAnswer(sql( + "select * from alter_com"), + Seq(Row(1, make(Array(1, 5)), + make(Array(1, 2)), + make(Array(1, 2, 3)), + make(Array(1.2, 2.3)), + make(Array(java.math.BigDecimal.valueOf(4.5).setScale(2), + java.math.BigDecimal.valueOf(6.7).setScale(2))), + make(Array("hello", "world")), + make(Array("a", "bcd")), + make(Array("abcd", "efg")), + make(Array(true, false)), + make(Array(Date.valueOf("2017-02-01"), Date.valueOf("2018-09-11"))), + make(Array(Timestamp.valueOf("2017-02-01 00:01:00"), + Timestamp.valueOf("2018-02-01 02:21:00"))) + ))) + sql("DROP TABLE IF EXISTS alter_com") + } + + test("Test adding of struct of all primitive datatypes") { + sql("DROP TABLE IF EXISTS alter_com") + sql( + "CREATE TABLE alter_com(intField INT, structField struct<a:short,b:int,c:long,d:double," + + "e:decimal(8,2),f:string,g:char(5),h:varchar(50),i:boolean,j:date,k:timestamp>) STORED AS " + + "carbondata") + sql( + "insert into alter_com values(1, named_struct('a',1,'b',2,'c',3,'d',1.23,'e',2.34,'f'," + + "'hello','g','abc','h','def','i',true,'j','2017-02-01','k','2018-02-01 02:00:00.0') ) ") + sql("select * from alter_com").show(false) + sql("DROP TABLE IF EXISTS alter_com") + } + + def insertIntoTableForArrayType(): Unit = { + sql("insert into alter_com values(4,array(2),array(1,2,3,4),array('abc','def'))") + sql("insert into alter_com values(5,array(1,2),array(1), array('Hulk','Thor'))") + sql( + "insert into alter_com values(6,array(1,2,3),array(1234,8,33333),array('Iron','Man'," + + "'Jarvis'))") + } + + def checkRestulForArrayType(): Unit = { + val totalRows = sql("select * from alter_com").collect() + val a = sql("select * from alter_com where array_contains(arr2,2)").collect + val b = sql("select * from alter_com where array_contains(arr3,'Thor')").collect + val c = sql("select * from alter_com where intField = 1").collect + assert(totalRows.size == 6) + assert(a.size == 1) + assert(b.size == 1) + // check default value for newly added array column that is index - 3 and 4 + assert(c(0)(2) == null && c(0)(3) == null) + } + + test("Test alter add for arrays enabling local dictionary") { + sql("DROP TABLE IF EXISTS alter_com") + sql( + "CREATE TABLE alter_com(intField INT, arr1 array<int>) " + + "STORED AS carbondata TBLPROPERTIES('local_dictionary_enable'='true')") + + sql("insert into alter_com values(1,array(1) )") + sql("insert into alter_com values(2,array(9,0) )") + sql("insert into alter_com values(3,array(11,12,13) )") + + sql( + "ALTER TABLE alter_com ADD COLUMNS(arr2 array<int>, arr3 array<string>) TBLPROPERTIES" + + "('LOCAL_DICTIONARY_INCLUDE'='arr3')") + val schema = sql("describe alter_com").collect() + assert(schema.size == 4) + + // For the previous segments the default value for newly added array column is null + sql("insert into alter_com values(4,array(2),array(1,2,3,4),array('abc','def'))") + sql("insert into alter_com values(5,array(1,2),array(1), array('Hulk','Thor'))") + sql( + "insert into alter_com values(6,array(1,2,3),array(1234,8,33333),array('Iron','Man'," + + "'Jarvis'))") + + val totalRows = sql("select * from alter_com").collect() + val a = sql("select * from alter_com where array_contains(arr2,2)").collect + val b = sql("select * from alter_com where array_contains(arr3,'Thor')").collect + val c = sql("select * from alter_com where intField = 1").collect + assert(totalRows.size == 6) + assert(a.size == 1) + assert(b.size == 1) + // check default value for newly added array column that is index - 3 and 4 + assert(c(0)(2) == null && c(0)(3) == null) + sql("DROP TABLE IF EXISTS alter_com") + + } + + test("Test alter add for arrays disabling local dictionary") { + sql("DROP TABLE IF EXISTS alter_com") + sql( + "CREATE TABLE alter_com(intField INT, arr1 array<int>) " + + "STORED AS carbondata TBLPROPERTIES('local_dictionary_enable'='false')") + + sql("insert into alter_com values(1,array(1) )") + sql("insert into alter_com values(2,array(9,0) )") + sql("insert into alter_com values(3,array(11,12,13) )") + sql( + "ALTER TABLE alter_com ADD COLUMNS(arr2 array<int>, arr3 array<string>) TBLPROPERTIES" + + "('LOCAL_DICTIONARY_EXCLUDE'='arr3')") + val schema = sql("describe alter_com").collect() + assert(schema.size == 4) + + // For the previous segments the default value for newly added array column is null + sql("insert into alter_com values(4,array(2),array(1,2,3,4),array('abc','def'))") + sql("insert into alter_com values(5,array(1,2),array(1), array('Hulk','Thor'))") + sql( + "insert into alter_com values(6,array(1,2,3),array(1234,80,3333),array('Iron','Man'," + + "'Jarvis'))") + + val totalRows = sql("select * from alter_com").collect() + val a = sql("select * from alter_com where array_contains(arr2,2)").collect + val b = sql("select * from alter_com where array_contains(arr3,'Thor')").collect + val c = sql("select * from alter_com where intField = 1").collect + assert(totalRows.size == 6) + assert(a.size == 1) + assert(b.size == 1) + // check default value for newly added array column that is index - 3 and 4 + assert(c(0)(2) == null && c(0)(3) == null) + sql("DROP TABLE IF EXISTS alter_com") + + } + + def insertIntoTableForStructType(): Unit = { + sql( + "insert into alter_struct values(2, named_struct('id1', 'id2','name1','name2'), " + + "named_struct('a','id2','b', 'abc2'), 'hello world', 5, named_struct('c','id3'," + + "'d', 'abc3','e', 22), array(1,2,3) )") + sql( + "insert into alter_struct values(3, named_struct('id1', 'id3','name1','name3'), " + + "named_struct('a','id2.1','b', 'abc2.1'), 'India', 5, named_struct('c','id3.1'," + + "'d', 'abc3.1','e', 25), array(4,5) )") + + } + + def checkResultForStructType(): Unit = { + val totalRows = sql("select * from alter_struct").collect() + val a = sql("select * from alter_struct where struct1.a = 'id2' ").collect() + val b = sql( + "select * from alter_struct where struct1.a = 'id2' or struct2.c = 'id3.1' or " + + "array_contains(arr,5) ").collect() + val c = sql("select * from alter_struct where roll = 1").collect() + + assert(totalRows.size == 3) + assert(a.size == 1) + assert(b.size == 2) + // check default value for newly added struct column + assert(c(0)(2) == null) + } + + test("Test alter add for structs enabling local dictionary") { + sql("DROP TABLE IF EXISTS alter_struct") + sql( + "create table alter_struct(roll int, department struct<id1:string,name1:string>) STORED AS " + + "carbondata TBLPROPERTIES('local_dictionary_enable'='true')") + sql( + "insert into alter_struct values(1, named_struct('id1', 'id1','name1','name1'))") + sql( + "ALTER TABLE alter_struct ADD COLUMNS(struct1 struct<a:string,b:string>, temp string," + + " intField int, struct2 struct<c:string,d:string,e:int>, arr array<int>) TBLPROPERTIES " + + "('LOCAL_DICTIONARY_INCLUDE'='struct1, struct2')") + val schema = sql("describe alter_struct").collect() + assert(schema.size == 7) + // For the previous segments the default value for newly added struct column is null + insertIntoTableForStructType + checkResultForStructType + sql("DROP TABLE IF EXISTS alter_struct") + + } + + test("Test alter add for structs, disabling local dictionary") { + sql("DROP TABLE IF EXISTS alter_struct") + sql( + "create table alter_struct(roll int, department struct<id1:string,name1:string>) STORED AS " + + "carbondata TBLPROPERTIES('local_dictionary_enable'='false')") + sql( + "insert into alter_struct values(1, named_struct('id1', 'id1','name1','name1'))") + sql( + "ALTER TABLE alter_struct ADD COLUMNS(struct1 struct<a:string,b:string>, temp string," + + " intField int, struct2 struct<c:string,d:string,e:int>, arr array<int>) TBLPROPERTIES " + + "('LOCAL_DICTIONARY_EXCLUDE'='struct1, struct2')") + val schema = sql("describe alter_struct").collect() + assert(schema.size == 7) + // For the previous segments the default value for newly added struct column is null + insertIntoTableForStructType + checkResultForStructType + sql("DROP TABLE IF EXISTS alter_struct") + + } + Review comment: done ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala ########## @@ -275,6 +275,56 @@ class AlterTableColumnSchemaGenerator( allColumns ++= Seq(columnSchema) } newCols ++= Seq(columnSchema) + if(field.dataType== Some("Array") ) { Review comment: done ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala ########## @@ -534,10 +581,10 @@ object TableNewProcessor { if (highCardinalityDims.contains(colName)) { encoders.remove(Encoding.DICTIONARY) } - if (dataType == DataTypes.DATE) { - encoders.add(Encoding.DICTIONARY) - encoders.add(Encoding.DIRECT_DICTIONARY) - } else if (dataType == DataTypes.TIMESTAMP && !highCardinalityDims.contains(colName)) { + // complex children does not require encoding + if (!colName.contains('.') && (dataType == DataTypes.DATE || Review comment: done ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala ########## @@ -275,6 +275,56 @@ class AlterTableColumnSchemaGenerator( allColumns ++= Seq(columnSchema) } newCols ++= Seq(columnSchema) + if(field.dataType== Some("Array") ) { + columnSchema.setNumberOfChild(field.children.size) + val childField = field.children.get(0) + val childSchema: ColumnSchema = TableNewProcessor.createColumnSchema( + childField, + new java.util.ArrayList[Encoding](), + isDimensionCol = true, + childField.precision, + childField.scale, + childField.schemaOrdinal + currentSchemaOrdinal, + alterTableModel.highCardinalityDims, + alterTableModel.databaseName.getOrElse(dbName), + isSortColumn(childField.name.getOrElse(childField.column)), + isVarcharColumn(childField.name.getOrElse(childField.column))) + + if (childSchema.getDataType == DataTypes.VARCHAR) { + // put the new long string columns in 'longStringCols' + // and add them after old long string columns + longStringCols ++= Seq(childSchema) + } else { + allColumns ++= Seq(childSchema) + } + newCols ++= Seq(childSchema) + } else if (field.dataType == Some("Struct")) { Review comment: done ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala ########## @@ -275,6 +275,56 @@ class AlterTableColumnSchemaGenerator( allColumns ++= Seq(columnSchema) } newCols ++= Seq(columnSchema) + if(field.dataType== Some("Array") ) { Review comment: done ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala ########## @@ -275,6 +275,56 @@ class AlterTableColumnSchemaGenerator( allColumns ++= Seq(columnSchema) } newCols ++= Seq(columnSchema) + if(field.dataType== Some("Array") ) { + columnSchema.setNumberOfChild(field.children.size) + val childField = field.children.get(0) + val childSchema: ColumnSchema = TableNewProcessor.createColumnSchema( + childField, Review comment: Sorting is not supported for complex columns -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akkio-97 commented on a change in pull request #4115: URL: https://github.com/apache/carbondata/pull/4115#discussion_r613397852 ########## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithComplexArrayType.scala ########## @@ -43,6 +43,83 @@ class TestSIWithComplexArrayType extends QueryTest with BeforeAndAfterEach { sql("drop table if exists complextable5") } + test("Test alter add array column before creating SI") { + sql("drop table if exists complextable") + sql("create table complextable (id string, country array<string>, name string) stored as carbondata") + sql("insert into complextable select 1,array('china', 'us'), 'b'") + sql("insert into complextable select 2,array('pak'), 'v'") + + sql("drop index if exists index_11 on complextable") + sql( + "ALTER TABLE complextable ADD COLUMNS(arr2 array<string>)") + sql("insert into complextable select 3,array('china'), 'f',array('hello','world')") + sql("insert into complextable select 4,array('india'),'g',array('iron','man','jarvis')") + + val result1 = sql("select * from complextable where array_contains(arr2,'iron')") + val result2 = sql("select * from complextable where arr2[0]='iron'") + sql("create index index_11 on table complextable(arr2) as 'carbondata'") + val df1 = sql(" select * from complextable where array_contains(arr2,'iron')") + val df2 = sql(" select * from complextable where arr2[0]='iron'") + if (!isFilterPushedDownToSI(df1.queryExecution.sparkPlan)) { + assert(false) + } else { + assert(true) + } + if (!isFilterPushedDownToSI(df2.queryExecution.sparkPlan)) { + assert(false) + } else { + assert(true) + } + val doNotHitSIDf = sql(" select * from complextable where array_contains(arr2,'iron') and array_contains(arr2,'man')") + if (isFilterPushedDownToSI(doNotHitSIDf.queryExecution.sparkPlan)) { + assert(false) + } else { + assert(true) + } + checkAnswer(result1, df1) + checkAnswer(result2, df2) + } + + test("Test alter add array column after creating SI") { + sql("drop table if exists complextable") + sql("create table complextable (id string, country array<string>, name string) stored as carbondata") + sql("insert into complextable select 1,array('china', 'us'), 'b'") + sql("insert into complextable select 2,array('pak'), 'v'") + + sql("drop index if exists index_11 on complextable") + sql("create index index_11 on table complextable(country) as 'carbondata'") + + sql( Review comment: done ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/alterTable/TestAlterTableAddColumns.scala ########## @@ -136,6 +137,155 @@ class TestAlterTableAddColumns extends QueryTest with BeforeAndAfterAll { sql(s"""DROP TABLE IF EXISTS ${ tableName }""") } + test("Test alter add for arrays enabling local dictionary") { + sql("DROP TABLE IF EXISTS alter_com") + sql( + "CREATE TABLE alter_com(intField INT, arr1 array<int>) " + + "STORED AS carbondata TBLPROPERTIES('local_dictionary_enable'='true')") + + sql("insert into alter_com values(1,array(1) )") + sql("insert into alter_com values(2,array(9,0) )") + sql("insert into alter_com values(3,array(11,12,13) )") + + sql( + "ALTER TABLE alter_com ADD COLUMNS(arr2 array<int>, arr3 array<string>) TBLPROPERTIES" + + "('LOCAL_DICTIONARY_INCLUDE'='arr3')") + val schema = sql("describe alter_com").collect() + assert(schema.size == 4) + + // For the previous segments the default value for newly added array column is null + sql("insert into alter_com values(4,array(2),array(1,2,3,4),array('abc','def'))") + sql("insert into alter_com values(5,array(1,2),array(1), array('Hulk','Thor'))") + sql( + "insert into alter_com values(6,array(1,2,3),array(1234,8,33333),array('Iron','Man'," + + "'Jarvis'))") + + val totalRows = sql("select * from alter_com").collect() + val a = sql("select * from alter_com where array_contains(arr2,2)").collect + val b = sql("select * from alter_com where array_contains(arr3,'Thor')").collect + val c = sql("select * from alter_com where intField = 1").collect + assert(totalRows.size == 6) + assert(a.size == 1) + assert(b.size == 1) + // check default value for newly added array column that is index - 3 and 4 + assert(c(0)(2) == null && c(0)(3) == null) + sql("DROP TABLE IF EXISTS alter_com") + + } + + test("Test alter add for arrays disabling local dictionary") { + sql("DROP TABLE IF EXISTS alter_com") + sql( + "CREATE TABLE alter_com(intField INT, arr1 array<int>) " + + "STORED AS carbondata TBLPROPERTIES('local_dictionary_enable'='false')") + + sql("insert into alter_com values(1,array(1) )") + sql("insert into alter_com values(2,array(9,0) )") + sql("insert into alter_com values(3,array(11,12,13) )") + sql( + "ALTER TABLE alter_com ADD COLUMNS(arr2 array<double>, arr3 array<string>) TBLPROPERTIES" + + "('LOCAL_DICTIONARY_EXCLUDE'='arr3')") + val schema = sql("describe alter_com").collect() + assert(schema.size == 4) + + // For the previous segments the default value for newly added array column is null + sql("insert into alter_com values(4,array(2),array(1,2,3,4),array('abc','def'))") + sql("insert into alter_com values(5,array(1,2),array(1), array('Hulk','Thor'))") + sql( + "insert into alter_com values(6,array(1,2,3),array(1234,80,3333),array('Iron','Man'," + + "'Jarvis'))") + + val totalRows = sql("select * from alter_com").collect() + val a = sql("select * from alter_com where array_contains(arr2,2)").collect + val b = sql("select * from alter_com where array_contains(arr3,'Thor')").collect + val c = sql("select * from alter_com where intField = 1").collect + assert(totalRows.size == 6) + assert(a.size == 1) + assert(b.size == 1) + // check default value for newly added array column that is index - 3 and 4 + assert(c(0)(2) == null && c(0)(3) == null) + sql("DROP TABLE IF EXISTS alter_com") + + } + + test("Test alter add for structs enabling local dictionary") { + sql("DROP TABLE IF EXISTS alter_struct") + sql( + "create table alter_struct(roll int, department struct<id1:string,name1:string>) STORED AS " + + "carbondata TBLPROPERTIES('local_dictionary_enable'='true')") + sql( + "insert into alter_struct values(1, named_struct('id1', 'id1','name1','name1'))") + sql( + "ALTER TABLE alter_struct ADD COLUMNS(struct1 struct<a:string,b:string>, temp string," + + " intField int, struct2 struct<c:string,d:string,e:int>, arr array<int>) TBLPROPERTIES " + + "('LOCAL_DICTIONARY_INCLUDE'='struct1, struct2')") + val schema = sql("describe alter_struct").collect() + assert(schema.size == 7) + + sql( + "insert into alter_struct values(2, named_struct('id1', 'id1','name1','name1'), " + + "named_struct('a','id2','b', 'abc2'), 'hello world', 5, named_struct('c','id3'," + + "'d', 'abc3','e', 22), array(1,2,3) )") + sql( + "insert into alter_struct values(3, named_struct('id1', 'id1','name1','name1'), " + + "named_struct('a','id2.1','b', 'abc2.1'), 'India', 5, named_struct('c','id3.1'," + + "'d', 'abc3.1','e', 25), array(4,5) )") + + val totalRows = sql("select * from alter_struct").collect() + val a = sql("select * from alter_struct where struct1.a = 'id2' ").collect() + val b = sql( + "select * from alter_struct where struct1.a = 'id2' or struct2.c = 'id3.1' or " + + "array_contains(arr,5) ").collect() + val c = sql("select * from alter_struct where roll = 1").collect() + + assert(totalRows.size == 3) + assert(a.size == 1) + assert(b.size == 2) + // check default value for newly added struct column + assert(c(0)(2) == null) + sql("DROP TABLE IF EXISTS alter_struct") + + } + + test("Test alter add for structs, disabling local dictionary") { + sql("DROP TABLE IF EXISTS alter_struct") + sql( + "create table alter_struct(roll int, department struct<id1:string,name1:string>) STORED AS " + + "carbondata TBLPROPERTIES('local_dictionary_enable'='false')") Review comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akkio-97 commented on a change in pull request #4115: URL: https://github.com/apache/carbondata/pull/4115#discussion_r613398079 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala ########## @@ -275,6 +275,56 @@ class AlterTableColumnSchemaGenerator( allColumns ++= Seq(columnSchema) } newCols ++= Seq(columnSchema) + if(field.dataType== Some("Array") ) { + columnSchema.setNumberOfChild(field.children.size) + val childField = field.children.get(0) Review comment: done ########## File path: core/src/main/java/org/apache/carbondata/core/scan/executor/util/QueryUtil.java ########## @@ -280,6 +280,9 @@ public static void fillQueryDimensionChunkIndexes( private static void fillParentDetails(Map<Integer, Integer> dimensionToBlockIndexMap, CarbonDimension dimension, Map<Integer, GenericQueryType> complexTypeMap) { + if (!dimensionToBlockIndexMap.containsKey(dimension.getOrdinal())) { Review comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akkio-97 commented on pull request #4115: URL: https://github.com/apache/carbondata/pull/4115#issuecomment-819673737 > @akkio-97 For alter add columns, currently we support a property to provide default value in alter add ddl, like "DEFAULT.VALUE.column_name=somevalue". Please check and handle this scenario also. 1) This is a part of the query flow. Because only when the select command is fired the data assigned will be inserted to the row. 2) The data which we assign as default is not parsed through queryUtil flow and is just a byte array. This data is not encoded so existing parsers and decoders cannot be used here to extract actual data and insert into the row. 3) We need to first write a new parser and then create ArrayQueryType and PrimitiveQueryType. After which insertion can be done. 4) This I think can be handled in a separate PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akkio-97 edited a comment on pull request #4115: URL: https://github.com/apache/carbondata/pull/4115#issuecomment-819673737 > @akkio-97 For alter add columns, currently we support a property to provide default value in alter add ddl, like "DEFAULT.VALUE.column_name=somevalue". Please check and handle this scenario also. 1) This is a part of the query flow. Because only when the select command is fired the data assigned will be inserted into the row. 2) The data which we assign as default is not parsed through queryUtil flow and is just a byte array. This data is not encoded so existing parsers and decoders cannot be used here to extract actual data and insert into the row. 3) We need to first write a new parser and then create ArrayQueryType and PrimitiveQueryType. After which insertion can be done. 4) This I think can be handled in a separate PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4115: URL: https://github.com/apache/carbondata/pull/4115#issuecomment-819736041 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5182/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4115: URL: https://github.com/apache/carbondata/pull/4115#issuecomment-819740885 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3430/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akkio-97 commented on pull request #4115: URL: https://github.com/apache/carbondata/pull/4115#issuecomment-819783720 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4115: URL: https://github.com/apache/carbondata/pull/4115#issuecomment-819837007 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5183/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4115: URL: https://github.com/apache/carbondata/pull/4115#issuecomment-819839099 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3431/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on pull request #4115: URL: https://github.com/apache/carbondata/pull/4115#issuecomment-820083693 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |