ShreelekhyaG opened a new pull request #4113: URL: https://github.com/apache/carbondata/pull/4113 ### Why is this PR needed? Currently describe formatted displays the column information of a table and some additional information. When complex types such as ARRAY, STRUCT, and MAP types are present in the table, column definition can be long and it’s difficult to read in a nested format. ### What changes were proposed in this PR? For complex types available, the DESCRIBE output can be formatted to avoid long lines for multiple fields. We can pass the complex field name to the command and visualize its structure as if were a table. DDL Commands: ``` DESCRIBE COLUMN fieldname ON [db_name.]table_name; DESCRIBE short [db_name.]table_name; ``` ### Does this PR introduce any user interface change? - Yes ### Is any new testcase added? - Yes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA2 commented on pull request #4113: URL: https://github.com/apache/carbondata/pull/4113#issuecomment-809254008 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3349/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4113: URL: https://github.com/apache/carbondata/pull/4113#issuecomment-809254470 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5100/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4113: URL: https://github.com/apache/carbondata/pull/4113#issuecomment-811106207 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5103/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4113: URL: https://github.com/apache/carbondata/pull/4113#issuecomment-811106947 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3352/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #4113: URL: https://github.com/apache/carbondata/pull/4113#discussion_r607512702 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParserUtil.scala ########## @@ -744,6 +747,52 @@ object CarbonSparkSqlParserUtil { CarbonAlterTableAddColumnCommand(alterTableAddColumnsModel) } + + def describeColumn( + databaseNameOp: Option[String], + tableName: String, + inputFields: java.util.List[String] + ): CarbonDescribeColumnCommand = { + val sparkSession = SparkSQLUtil.getSparkSession + validateTableExists(databaseNameOp, tableName, sparkSession) + val relation = CarbonEnv + .getInstance(sparkSession) + .carbonMetaStore + .lookupRelation(databaseNameOp, tableName)(sparkSession) + .asInstanceOf[CarbonRelation] + val tableSchema = StructType.fromAttributes(relation.output) + val carbonTable = relation.carbonTable + val inputColumn = tableSchema.find(_.name.equalsIgnoreCase(inputFields.get(0))) + if (!inputColumn.isDefined) { + throw new MalformedCarbonCommandException( + s"${inputFields.get(0)} not present in schema of table: $tableName") + } + CarbonDescribeColumnCommand( + carbonTable, + inputFields, + inputColumn.get + ) + } + + def describeShort( + databaseNameOp: Option[String], + tableName: String + ): CarbonDescribeShortCommand = { + val sparkSession = SparkSQLUtil.getSparkSession + validateTableExists(databaseNameOp, tableName, sparkSession) Review comment: Can move duplicate code from describeShort and describeColumn to common method ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/parser/CarbonExtensionSpark2SqlParser.scala ########## @@ -27,7 +28,26 @@ import org.apache.spark.sql.catalyst.plans.logical._ class CarbonExtensionSpark2SqlParser extends CarbonSpark2SqlParser { override protected lazy val extendedSparkSyntax: Parser[LogicalPlan] = - loadDataNew | alterTableAddColumns | explainPlan + loadDataNew | alterTableAddColumns | explainPlan | describeColumn | describeShort Review comment: I think, no need to define here, since CarbonExtensionSpark2SqlParser extends CarbonSpark2SqlParser. ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala ########## @@ -370,3 +373,134 @@ private[sql] case class CarbonDescribeFormattedCommand( override protected def opName: String = "DESC FORMATTED" } + +case class CarbonDescribeColumnCommand( + carbonTable: CarbonTable, + inputFieldNames: java.util.List[String], + field: StructField) + extends MetadataCommand { + + override val output: Seq[Attribute] = Seq( + // Column names are based on Hive. + AttributeReference("col_name", StringType, nullable = false, + new MetadataBuilder().putString("comment", "name of the column").build())(), + AttributeReference("data_type", StringType, nullable = false, + new MetadataBuilder().putString("comment", "data type of the column").build())(), + AttributeReference("comment", StringType, nullable = true, + new MetadataBuilder().putString("comment", "comment of the column").build())() + ) + + override def processMetadata(sparkSession: SparkSession): Seq[Row] = { + setAuditTable(carbonTable) + var results = Seq[(String, String, String)]() + var currField = field + val inputFieldsIterator = inputFieldNames.iterator() + var inputColumn = inputFieldsIterator.next() + while (results.size == 0) { + breakable { + if (currField.dataType.typeName.equalsIgnoreCase(CarbonCommonConstants.ARRAY)) { + if (inputFieldsIterator.hasNext) { + val nextField = inputFieldsIterator.next() + if (!nextField.equalsIgnoreCase("item")) { + throw handleException(nextField, currField.name, carbonTable.getTableName) + } + currField = StructField("item", currField.dataType.asInstanceOf[ArrayType].elementType) + inputColumn += "." + currField.name + break() + } + val colComment = currField.getComment().getOrElse("null") + results = Seq((inputColumn, + currField.dataType.typeName, currField.getComment().getOrElse("null")), + ("## Children of " + inputColumn + ": ", "", "")) + results ++= Seq(("item", currField.dataType.asInstanceOf[ArrayType] + .elementType.simpleString, colComment)) Review comment: colComment given for Parent Column, will be displayed while describe child columns also? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #4113: URL: https://github.com/apache/carbondata/pull/4113#discussion_r607518121 ########## File path: docs/ddl-of-carbondata.md ########## @@ -646,6 +647,28 @@ CarbonData DDL statements are documented here,which includes: ## TABLE MANAGEMENT +### DESCRIBE COMMAND + +Describe column of table and visualize its structure with child fields. + ``` + DESCRIBE COLUMN fieldname[.nestedFieldNames] ON [db_name.]table_name; + + Example: DESCRIBE COLUMN channelsId ON carbonTable; + +----------------------------+---------+-------+ + |col_name |data_type|comment| + +----------------------------+---------+-------+ + |channelsId |map |null | + |## Children of channelsId: | | | + |key |string |null | + |value |string |null | + +----------------------------+---------+-------+ + ``` + +This command is used to display short version of table columns. Review comment: ```suggestion This command is used to display short version of table complex columns. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #4113: URL: https://github.com/apache/carbondata/pull/4113#discussion_r607534742 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/describeTable/TestDescribeTable.scala ########## @@ -72,11 +84,131 @@ class TestDescribeTable extends QueryTest with BeforeAndAfterAll { assert(descPar.exists(_.toString().contains("Partition Parameters:"))) } + test("test describe column field name") { + // describe primitive column + var desc = sql("describe column deviceInformationId on complexcarbontable").collect() + assert(desc(0).get(0).asInstanceOf[String].trim.equals("deviceInformationId")) + assert(desc(0).get(1).asInstanceOf[String].trim.equals("integer")) + + // describe simple map + /* + +----------------------------+---------+-------+ + |col_name |data_type|comment| + +----------------------------+---------+-------+ + |channelsId |map |null | + |## Children of channelsId: | | | + |key |string |null | + |value |string |null | + +----------------------------+---------+-------+ + */ + desc = sql("describe column channelsId on complexcarbontable").collect() Review comment: Better to support short form of DESCRIBE also. DDL can be like [DESCRIBE|DESC] column <...> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4113: URL: https://github.com/apache/carbondata/pull/4113#issuecomment-814179807 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5136/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4113: URL: https://github.com/apache/carbondata/pull/4113#issuecomment-814194686 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3385/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ShreelekhyaG commented on a change in pull request #4113: URL: https://github.com/apache/carbondata/pull/4113#discussion_r608342525 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/describeTable/TestDescribeTable.scala ########## @@ -72,11 +84,131 @@ class TestDescribeTable extends QueryTest with BeforeAndAfterAll { assert(descPar.exists(_.toString().contains("Partition Parameters:"))) } + test("test describe column field name") { + // describe primitive column + var desc = sql("describe column deviceInformationId on complexcarbontable").collect() + assert(desc(0).get(0).asInstanceOf[String].trim.equals("deviceInformationId")) + assert(desc(0).get(1).asInstanceOf[String].trim.equals("integer")) + + // describe simple map + /* + +----------------------------+---------+-------+ + |col_name |data_type|comment| + +----------------------------+---------+-------+ + |channelsId |map |null | + |## Children of channelsId: | | | + |key |string |null | + |value |string |null | + +----------------------------+---------+-------+ + */ + desc = sql("describe column channelsId on complexcarbontable").collect() Review comment: ok done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ShreelekhyaG commented on a change in pull request #4113: URL: https://github.com/apache/carbondata/pull/4113#discussion_r608343704 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala ########## @@ -370,3 +373,134 @@ private[sql] case class CarbonDescribeFormattedCommand( override protected def opName: String = "DESC FORMATTED" } + +case class CarbonDescribeColumnCommand( + carbonTable: CarbonTable, + inputFieldNames: java.util.List[String], + field: StructField) + extends MetadataCommand { + + override val output: Seq[Attribute] = Seq( + // Column names are based on Hive. + AttributeReference("col_name", StringType, nullable = false, + new MetadataBuilder().putString("comment", "name of the column").build())(), + AttributeReference("data_type", StringType, nullable = false, + new MetadataBuilder().putString("comment", "data type of the column").build())(), + AttributeReference("comment", StringType, nullable = true, + new MetadataBuilder().putString("comment", "comment of the column").build())() + ) + + override def processMetadata(sparkSession: SparkSession): Seq[Row] = { + setAuditTable(carbonTable) + var results = Seq[(String, String, String)]() + var currField = field + val inputFieldsIterator = inputFieldNames.iterator() + var inputColumn = inputFieldsIterator.next() + while (results.size == 0) { + breakable { + if (currField.dataType.typeName.equalsIgnoreCase(CarbonCommonConstants.ARRAY)) { + if (inputFieldsIterator.hasNext) { + val nextField = inputFieldsIterator.next() + if (!nextField.equalsIgnoreCase("item")) { + throw handleException(nextField, currField.name, carbonTable.getTableName) + } + currField = StructField("item", currField.dataType.asInstanceOf[ArrayType].elementType) + inputColumn += "." + currField.name + break() + } + val colComment = currField.getComment().getOrElse("null") + results = Seq((inputColumn, + currField.dataType.typeName, currField.getComment().getOrElse("null")), + ("## Children of " + inputColumn + ": ", "", "")) + results ++= Seq(("item", currField.dataType.asInstanceOf[ArrayType] + .elementType.simpleString, colComment)) Review comment: no. modified so that child columns don't display parent column's comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ShreelekhyaG commented on a change in pull request #4113: URL: https://github.com/apache/carbondata/pull/4113#discussion_r608343781 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/parser/CarbonExtensionSpark2SqlParser.scala ########## @@ -27,7 +28,26 @@ import org.apache.spark.sql.catalyst.plans.logical._ class CarbonExtensionSpark2SqlParser extends CarbonSpark2SqlParser { override protected lazy val extendedSparkSyntax: Parser[LogicalPlan] = - loadDataNew | alterTableAddColumns | explainPlan + loadDataNew | alterTableAddColumns | explainPlan | describeColumn | describeShort Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ShreelekhyaG commented on a change in pull request #4113: URL: https://github.com/apache/carbondata/pull/4113#discussion_r608343891 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParserUtil.scala ########## @@ -744,6 +747,52 @@ object CarbonSparkSqlParserUtil { CarbonAlterTableAddColumnCommand(alterTableAddColumnsModel) } + + def describeColumn( + databaseNameOp: Option[String], + tableName: String, + inputFields: java.util.List[String] + ): CarbonDescribeColumnCommand = { + val sparkSession = SparkSQLUtil.getSparkSession + validateTableExists(databaseNameOp, tableName, sparkSession) + val relation = CarbonEnv + .getInstance(sparkSession) + .carbonMetaStore + .lookupRelation(databaseNameOp, tableName)(sparkSession) + .asInstanceOf[CarbonRelation] + val tableSchema = StructType.fromAttributes(relation.output) + val carbonTable = relation.carbonTable + val inputColumn = tableSchema.find(_.name.equalsIgnoreCase(inputFields.get(0))) + if (!inputColumn.isDefined) { + throw new MalformedCarbonCommandException( + s"${inputFields.get(0)} not present in schema of table: $tableName") + } + CarbonDescribeColumnCommand( + carbonTable, + inputFields, + inputColumn.get + ) + } + + def describeShort( + databaseNameOp: Option[String], + tableName: String + ): CarbonDescribeShortCommand = { + val sparkSession = SparkSQLUtil.getSparkSession + validateTableExists(databaseNameOp, tableName, sparkSession) Review comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #4113: URL: https://github.com/apache/carbondata/pull/4113#discussion_r609333620 ########## File path: docs/ddl-of-carbondata.md ########## @@ -646,6 +647,28 @@ CarbonData DDL statements are documented here,which includes: ## TABLE MANAGEMENT +### DESCRIBE COMMAND + +Describe column of table and visualize its structure with child fields. + ``` + DESCRIBE COLUMN fieldname[.nestedFieldNames] ON [db_name.]table_name; Review comment: ```suggestion [DESCRIBE | DESC] COLUMN fieldname[.nestedFieldNames] ON [db_name.]table_name; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #4113: URL: https://github.com/apache/carbondata/pull/4113#discussion_r609333779 ########## File path: docs/ddl-of-carbondata.md ########## @@ -646,6 +647,28 @@ CarbonData DDL statements are documented here,which includes: ## TABLE MANAGEMENT +### DESCRIBE COMMAND + +Describe column of table and visualize its structure with child fields. + ``` + DESCRIBE COLUMN fieldname[.nestedFieldNames] ON [db_name.]table_name; + + Example: DESCRIBE COLUMN channelsId ON carbonTable; + +----------------------------+---------+-------+ + |col_name |data_type|comment| + +----------------------------+---------+-------+ + |channelsId |map |null | + |## Children of channelsId: | | | + |key |string |null | + |value |string |null | + +----------------------------+---------+-------+ + ``` + +This command is used to display short version of table complex columns. + ``` + DESCRIBE SHORT [db_name.]table_name; Review comment: same comment as above -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #4113: URL: https://github.com/apache/carbondata/pull/4113#discussion_r609374954 ########## File path: docs/ddl-of-carbondata.md ########## @@ -646,6 +647,28 @@ CarbonData DDL statements are documented here,which includes: ## TABLE MANAGEMENT +### DESCRIBE COMMAND + +Describe column of table and visualize its structure with child fields. + ``` + DESCRIBE COLUMN fieldname[.nestedFieldNames] ON [db_name.]table_name; + + Example: DESCRIBE COLUMN channelsId ON carbonTable; Review comment: Can add some examples to query Array child, since have to provide arr_name.item -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4113: URL: https://github.com/apache/carbondata/pull/4113#issuecomment-817751347 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5154/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4113: URL: https://github.com/apache/carbondata/pull/4113#issuecomment-817764047 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3402/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ShreelekhyaG commented on a change in pull request #4113: URL: https://github.com/apache/carbondata/pull/4113#discussion_r611579153 ########## File path: docs/ddl-of-carbondata.md ########## @@ -646,6 +647,28 @@ CarbonData DDL statements are documented here,which includes: ## TABLE MANAGEMENT +### DESCRIBE COMMAND + +Describe column of table and visualize its structure with child fields. + ``` + DESCRIBE COLUMN fieldname[.nestedFieldNames] ON [db_name.]table_name; Review comment: Done ########## File path: docs/ddl-of-carbondata.md ########## @@ -646,6 +647,28 @@ CarbonData DDL statements are documented here,which includes: ## TABLE MANAGEMENT +### DESCRIBE COMMAND + +Describe column of table and visualize its structure with child fields. + ``` + DESCRIBE COLUMN fieldname[.nestedFieldNames] ON [db_name.]table_name; + + Example: DESCRIBE COLUMN channelsId ON carbonTable; Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |