[GitHub] [carbondata] ShreelekhyaG opened a new pull request #4113: [WIP] Describe complex columns

classic Classic list List threaded Threaded
57 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4113: [CARBONDATA-4161] Describe complex columns

GitBox

Indhumathi27 commented on a change in pull request #4113:
URL: https://github.com/apache/carbondata/pull/4113#discussion_r612947980



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParserUtil.scala
##########
@@ -744,6 +747,47 @@ object CarbonSparkSqlParserUtil {
     CarbonAlterTableAddColumnCommand(alterTableAddColumnsModel)
   }
 
+  def describeColumn(
+      databaseNameOp: Option[String],
+      tableName: String,
+      inputFields: java.util.List[String]
+  ): CarbonDescribeColumnCommand = {
+    val (tableSchema, carbonTable) = getSchemaAndTable(databaseNameOp, tableName)
+    val inputColumn = tableSchema.find(_.name.equalsIgnoreCase(inputFields.get(0)))
+    if (!inputColumn.isDefined) {
+      throw new MalformedCarbonCommandException(
+        s"${inputFields.get(0)} not present in schema of table: $tableName")

Review comment:
       ```suggestion
           s"Column ${inputFields.get(0)} does not exists in the table $dbName.$tableName")
   ```




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4113: [CARBONDATA-4161] Describe complex columns

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #4113:
URL: https://github.com/apache/carbondata/pull/4113#discussion_r612958690



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParserUtil.scala
##########
@@ -744,6 +747,47 @@ object CarbonSparkSqlParserUtil {
     CarbonAlterTableAddColumnCommand(alterTableAddColumnsModel)
   }
 
+  def describeColumn(
+      databaseNameOp: Option[String],
+      tableName: String,
+      inputFields: java.util.List[String]
+  ): CarbonDescribeColumnCommand = {
+    val (tableSchema, carbonTable) = getSchemaAndTable(databaseNameOp, tableName)
+    val inputColumn = tableSchema.find(_.name.equalsIgnoreCase(inputFields.get(0)))
+    if (!inputColumn.isDefined) {
+      throw new MalformedCarbonCommandException(
+        s"${inputFields.get(0)} not present in schema of table: $tableName")
+    }
+    CarbonDescribeColumnCommand(
+      carbonTable,
+      inputFields,
+      inputColumn.get
+    )
+  }
+
+  def describeShort(
+      databaseNameOp: Option[String],
+      tableName: String
+  ): CarbonDescribeShortCommand = {
+    val (tableSchema, carbonTable) = getSchemaAndTable(databaseNameOp, tableName)
+    CarbonDescribeShortCommand(
+      carbonTable,
+      tableSchema
+    )
+  }
+
+  def getSchemaAndTable(databaseNameOp: Option[String],
+      tableName: String) : (StructType, CarbonTable) = {
+    val sparkSession = SparkSQLUtil.getSparkSession
+    validateTableExists(databaseNameOp, tableName, sparkSession)

Review comment:
       Add a testcase for Describe column on table not exists. Looks like currently, it is not throwing Table not found, instead throws parsing error.




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4113: [CARBONDATA-4161] Describe complex columns

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #4113:
URL: https://github.com/apache/carbondata/pull/4113#discussion_r612959573



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParserUtil.scala
##########
@@ -744,6 +747,47 @@ object CarbonSparkSqlParserUtil {
     CarbonAlterTableAddColumnCommand(alterTableAddColumnsModel)
   }
 
+  def describeColumn(
+      databaseNameOp: Option[String],
+      tableName: String,
+      inputFields: java.util.List[String]
+  ): CarbonDescribeColumnCommand = {
+    val (tableSchema, carbonTable) = getSchemaAndTable(databaseNameOp, tableName)
+    val inputColumn = tableSchema.find(_.name.equalsIgnoreCase(inputFields.get(0)))
+    if (!inputColumn.isDefined) {

Review comment:
       ```suggestion
       if (inputColumn.isEmpty) {
   ```




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4113: [CARBONDATA-4161] Describe complex columns

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #4113:
URL: https://github.com/apache/carbondata/pull/4113#discussion_r612959912



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala
##########
@@ -370,3 +373,139 @@ private[sql] case class CarbonDescribeFormattedCommand(
 
   override protected def opName: String = "DESC FORMATTED"
 }
+
+case class CarbonDescribeColumnCommand(
+    carbonTable: CarbonTable,
+    inputFieldNames: java.util.List[String],
+    field: StructField)
+  extends MetadataCommand {
+
+  override val output: Seq[Attribute] = Seq(
+    // Column names are based on Hive.
+    AttributeReference("col_name", StringType, nullable = false,
+      new MetadataBuilder().putString("comment", "name of the column").build())(),
+    AttributeReference("data_type", StringType, nullable = false,
+      new MetadataBuilder().putString("comment", "data type of the column").build())(),
+    AttributeReference("comment", StringType, nullable = true,
+      new MetadataBuilder().putString("comment", "comment of the column").build())()
+  )
+
+  override def processMetadata(sparkSession: SparkSession): Seq[Row] = {
+    setAuditTable(carbonTable)
+    var results = Seq[(String, String, String)]()
+    var currField = field
+    val inputFieldsIterator = inputFieldNames.iterator()
+    var inputColumn = inputFieldsIterator.next()
+    while (results.size == 0) {

Review comment:
       ```suggestion
       while (results.isEmpty) {
   ```




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4113: [CARBONDATA-4161] Describe complex columns

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #4113:
URL: https://github.com/apache/carbondata/pull/4113#discussion_r612958690



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParserUtil.scala
##########
@@ -744,6 +747,47 @@ object CarbonSparkSqlParserUtil {
     CarbonAlterTableAddColumnCommand(alterTableAddColumnsModel)
   }
 
+  def describeColumn(
+      databaseNameOp: Option[String],
+      tableName: String,
+      inputFields: java.util.List[String]
+  ): CarbonDescribeColumnCommand = {
+    val (tableSchema, carbonTable) = getSchemaAndTable(databaseNameOp, tableName)
+    val inputColumn = tableSchema.find(_.name.equalsIgnoreCase(inputFields.get(0)))
+    if (!inputColumn.isDefined) {
+      throw new MalformedCarbonCommandException(
+        s"${inputFields.get(0)} not present in schema of table: $tableName")
+    }
+    CarbonDescribeColumnCommand(
+      carbonTable,
+      inputFields,
+      inputColumn.get
+    )
+  }
+
+  def describeShort(
+      databaseNameOp: Option[String],
+      tableName: String
+  ): CarbonDescribeShortCommand = {
+    val (tableSchema, carbonTable) = getSchemaAndTable(databaseNameOp, tableName)
+    CarbonDescribeShortCommand(
+      carbonTable,
+      tableSchema
+    )
+  }
+
+  def getSchemaAndTable(databaseNameOp: Option[String],
+      tableName: String) : (StructType, CarbonTable) = {
+    val sparkSession = SparkSQLUtil.getSparkSession
+    validateTableExists(databaseNameOp, tableName, sparkSession)

Review comment:
       Add a testcase for Describe column on table not exists. Looks like currently, it is not throwing Table not found, instead throws parsing error. Since, validations are added inside Parser, it is throwing Parsing error. Remove these validations from Parser and table in Command.




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4113: [CARBONDATA-4161] Describe complex columns

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #4113:
URL: https://github.com/apache/carbondata/pull/4113#discussion_r612958690



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParserUtil.scala
##########
@@ -744,6 +747,47 @@ object CarbonSparkSqlParserUtil {
     CarbonAlterTableAddColumnCommand(alterTableAddColumnsModel)
   }
 
+  def describeColumn(
+      databaseNameOp: Option[String],
+      tableName: String,
+      inputFields: java.util.List[String]
+  ): CarbonDescribeColumnCommand = {
+    val (tableSchema, carbonTable) = getSchemaAndTable(databaseNameOp, tableName)
+    val inputColumn = tableSchema.find(_.name.equalsIgnoreCase(inputFields.get(0)))
+    if (!inputColumn.isDefined) {
+      throw new MalformedCarbonCommandException(
+        s"${inputFields.get(0)} not present in schema of table: $tableName")
+    }
+    CarbonDescribeColumnCommand(
+      carbonTable,
+      inputFields,
+      inputColumn.get
+    )
+  }
+
+  def describeShort(
+      databaseNameOp: Option[String],
+      tableName: String
+  ): CarbonDescribeShortCommand = {
+    val (tableSchema, carbonTable) = getSchemaAndTable(databaseNameOp, tableName)
+    CarbonDescribeShortCommand(
+      carbonTable,
+      tableSchema
+    )
+  }
+
+  def getSchemaAndTable(databaseNameOp: Option[String],
+      tableName: String) : (StructType, CarbonTable) = {
+    val sparkSession = SparkSQLUtil.getSparkSession
+    validateTableExists(databaseNameOp, tableName, sparkSession)

Review comment:
       Add a testcase for Describe column on table not exists. Looks like currently, it is not throwing Table not found, instead throws parsing error. Since, validations are added inside Parser, it is throwing Parsing error. Remove these validations from Parser and hanble in Command.




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] maheshrajus commented on a change in pull request #4113: [CARBONDATA-4161] Describe complex columns

GitBox
In reply to this post by GitBox

maheshrajus commented on a change in pull request #4113:
URL: https://github.com/apache/carbondata/pull/4113#discussion_r613093529



##########
File path: docs/ddl-of-carbondata.md
##########
@@ -646,6 +647,69 @@ CarbonData DDL statements are documented here,which includes:
 
 ## TABLE MANAGEMENT  
 
+### DESCRIBE COMMAND
+
+Describe column of table and visualize its structure with child fields.
+  ```
+  [DESCRIBE | DESC] COLUMN fieldname[.nestedFieldNames] ON [db_name.]table_name;
+
+  Examples:
+  
+    DESCRIBE COLUMN locationinfo on complexcarbontable;
+    
+    +------------------------------+--------------------------------------------------------------------------------------------------------------------+-------+
+    |col_name                      |data_type                                                                                                           |comment|
+    +------------------------------+--------------------------------------------------------------------------------------------------------------------+-------+
+    |locationinfo                  |array                                                                                                               |null   |
+    |## Children of locationinfo:  |                                                                                                                    |       |
+    |item                          |struct<activeprovince:string,activecity:map<string,string>,activedistrict:string,activestreet:array<string>>        |null   |
+    +------------------------------+--------------------------------------------------------------------------------------------------------------------+-------+
+    
+    DESCRIBE COLUMN locationinfo.item on complexcarbontable
+    
+    +-----------------------------------+------------------+-------+
+    |col_name                           |data_type         |comment|
+    +-----------------------------------+------------------+-------+
+    |locationinfo.item                  |struct            |null   |

Review comment:
       comment section always null ?




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] maheshrajus commented on a change in pull request #4113: [CARBONDATA-4161] Describe complex columns

GitBox
In reply to this post by GitBox

maheshrajus commented on a change in pull request #4113:
URL: https://github.com/apache/carbondata/pull/4113#discussion_r613099033



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala
##########
@@ -370,3 +374,147 @@ private[sql] case class CarbonDescribeFormattedCommand(
 
   override protected def opName: String = "DESC FORMATTED"
 }
+
+case class CarbonDescribeColumnCommand(
+    databaseNameOp: Option[String],
+    tableName: String,
+    inputFieldNames: java.util.List[String])
+  extends MetadataCommand {
+
+  override val output: Seq[Attribute] = Seq(
+    // Column names are based on Hive.
+    AttributeReference("col_name", StringType, nullable = false,
+      new MetadataBuilder().putString("comment", "name of the column").build())(),
+    AttributeReference("data_type", StringType, nullable = false,
+      new MetadataBuilder().putString("comment", "data type of the column").build())(),
+    AttributeReference("comment", StringType, nullable = true,
+      new MetadataBuilder().putString("comment", "comment of the column").build())()
+  )
+
+  override def processMetadata(sparkSession: SparkSession): Seq[Row] = {
+    val (tableSchema, carbonTable) = Checker.getSchemaAndTable(databaseNameOp, tableName)
+    val inputField = tableSchema.find(_.name.equalsIgnoreCase(inputFieldNames.get(0)))
+    if (inputField.isEmpty) {
+      throw new MalformedCarbonCommandException(
+        s"Column ${ inputFieldNames.get(0) } does not exists in the table " +
+        s"${ carbonTable.getDatabaseName }.$tableName")
+    }
+    setAuditTable(carbonTable)
+    var results = Seq[(String, String, String)]()
+    var currField = inputField.get
+    val inputFieldsIterator = inputFieldNames.iterator()
+    var inputColumn = inputFieldsIterator.next()
+    while (results.isEmpty) {
+      breakable {
+        if (currField.dataType.typeName.equalsIgnoreCase(CarbonCommonConstants.ARRAY)) {

Review comment:
       is this command will work only for complex columns ? if we give primitive type column then whats the expected behavior ?




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4113: [CARBONDATA-4161] Describe complex columns

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4113:
URL: https://github.com/apache/carbondata/pull/4113#issuecomment-819397024


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5176/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] maheshrajus commented on a change in pull request #4113: [CARBONDATA-4161] Describe complex columns

GitBox
In reply to this post by GitBox

maheshrajus commented on a change in pull request #4113:
URL: https://github.com/apache/carbondata/pull/4113#discussion_r613113388



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala
##########
@@ -370,3 +374,147 @@ private[sql] case class CarbonDescribeFormattedCommand(
 
   override protected def opName: String = "DESC FORMATTED"
 }
+
+case class CarbonDescribeColumnCommand(
+    databaseNameOp: Option[String],
+    tableName: String,
+    inputFieldNames: java.util.List[String])
+  extends MetadataCommand {
+
+  override val output: Seq[Attribute] = Seq(
+    // Column names are based on Hive.
+    AttributeReference("col_name", StringType, nullable = false,
+      new MetadataBuilder().putString("comment", "name of the column").build())(),
+    AttributeReference("data_type", StringType, nullable = false,
+      new MetadataBuilder().putString("comment", "data type of the column").build())(),
+    AttributeReference("comment", StringType, nullable = true,
+      new MetadataBuilder().putString("comment", "comment of the column").build())()
+  )
+
+  override def processMetadata(sparkSession: SparkSession): Seq[Row] = {
+    val (tableSchema, carbonTable) = Checker.getSchemaAndTable(databaseNameOp, tableName)
+    val inputField = tableSchema.find(_.name.equalsIgnoreCase(inputFieldNames.get(0)))
+    if (inputField.isEmpty) {
+      throw new MalformedCarbonCommandException(
+        s"Column ${ inputFieldNames.get(0) } does not exists in the table " +
+        s"${ carbonTable.getDatabaseName }.$tableName")
+    }
+    setAuditTable(carbonTable)
+    var results = Seq[(String, String, String)]()
+    var currField = inputField.get
+    val inputFieldsIterator = inputFieldNames.iterator()
+    var inputColumn = inputFieldsIterator.next()
+    while (results.isEmpty) {
+      breakable {
+        if (currField.dataType.typeName.equalsIgnoreCase(CarbonCommonConstants.ARRAY)) {
+          if (inputFieldsIterator.hasNext) {
+            val nextField = inputFieldsIterator.next()
+            // child of an array can be only 'item'
+            if (!nextField.equalsIgnoreCase("item")) {
+              throw handleException(nextField, currField.name, carbonTable.getTableName)
+            }
+            // make the child type as current field to describe further nested types.
+            currField = StructField("item", currField.dataType.asInstanceOf[ArrayType].elementType)
+            inputColumn += "." + currField.name
+            break()
+          }
+          // if no further nested column given, display field information and its children.
+          results = Seq((inputColumn,
+            currField.dataType.typeName, currField.getComment().getOrElse("null")),
+            ("## Children of " + inputColumn + ":  ", "", ""))
+          results ++= Seq(("item", currField.dataType.asInstanceOf[ArrayType]
+            .elementType.simpleString, "null"))
+        }
+        else if (currField.dataType.typeName.equalsIgnoreCase(CarbonCommonConstants.STRUCT)) {
+          if (inputFieldsIterator.hasNext) {
+            val nextField = inputFieldsIterator.next()
+            val nextCurrField = currField.dataType.asInstanceOf[StructType].fields
+              .find(_.name.equalsIgnoreCase(nextField))
+            // verify if the input child name exists in the schema
+            if (!nextCurrField.isDefined) {
+              throw handleException(nextField, currField.name, carbonTable.getTableName)
+            }
+            // make the child type as current field to describe further nested types.
+            currField = nextCurrField.get
+            inputColumn += "." + currField.name
+            break()
+          }
+          // if no further nested column given, display field information and its children.
+          results = Seq((inputColumn,
+            currField.dataType.typeName, currField.getComment().getOrElse("null")),
+            ("## Children of " + inputColumn + ":  ", "", ""))
+          results ++= currField.dataType.asInstanceOf[StructType].fields.map(child => {
+            (child.name, child.dataType.simpleString, "null")
+          })
+        } else if (currField.dataType.typeName.equalsIgnoreCase(CarbonCommonConstants.MAP)) {
+          val children = currField.dataType.asInstanceOf[MapType]
+          if (inputFieldsIterator.hasNext) {
+            val nextField = inputFieldsIterator.next().toLowerCase()
+            // children of map can be only 'key' and 'value'
+            val nextCurrField = nextField match {
+              case "key" => StructField("key", children.keyType)
+              case "value" => StructField("value", children.valueType)
+              case _ => throw handleException(nextField, currField.name, carbonTable.getTableName)
+            }
+            // make the child type as current field to describe further nested types.
+            currField = nextCurrField
+            inputColumn += "." + currField.name
+            break()
+          }
+          // if no further nested column given, display field information and its children.
+          results = Seq((inputColumn,
+            currField.dataType.typeName, currField.getComment().getOrElse("null")),
+            ("## Children of " + inputColumn + ":  ", "", ""))
+          results ++= Seq(("key", children.keyType.simpleString, "null"),
+            ("value", children.valueType.simpleString, "null"))
+        } else {
+          results = Seq((inputColumn,
+            currField.dataType.typeName, currField.getComment().getOrElse("null")))
+        }
+      }
+    }
+    results.map { case (c1, c2, c3) => Row(c1, c2, c3) }
+  }
+
+  def handleException(nextField: String, currField: String, tableName: String): Throwable = {

Review comment:
       can you add a test case for catching this exception?




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4113: [CARBONDATA-4161] Describe complex columns

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4113:
URL: https://github.com/apache/carbondata/pull/4113#issuecomment-819403628


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3424/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4113: [CARBONDATA-4161] Describe complex columns

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4113:
URL: https://github.com/apache/carbondata/pull/4113#discussion_r613119845



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParserUtil.scala
##########
@@ -744,6 +747,47 @@ object CarbonSparkSqlParserUtil {
     CarbonAlterTableAddColumnCommand(alterTableAddColumnsModel)
   }
 
+  def describeColumn(
+      databaseNameOp: Option[String],
+      tableName: String,
+      inputFields: java.util.List[String]
+  ): CarbonDescribeColumnCommand = {
+    val (tableSchema, carbonTable) = getSchemaAndTable(databaseNameOp, tableName)
+    val inputColumn = tableSchema.find(_.name.equalsIgnoreCase(inputFields.get(0)))
+    if (!inputColumn.isDefined) {

Review comment:
       Done




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4113: [CARBONDATA-4161] Describe complex columns

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4113:
URL: https://github.com/apache/carbondata/pull/4113#discussion_r613119948



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParserUtil.scala
##########
@@ -744,6 +747,47 @@ object CarbonSparkSqlParserUtil {
     CarbonAlterTableAddColumnCommand(alterTableAddColumnsModel)
   }
 
+  def describeColumn(
+      databaseNameOp: Option[String],
+      tableName: String,
+      inputFields: java.util.List[String]
+  ): CarbonDescribeColumnCommand = {
+    val (tableSchema, carbonTable) = getSchemaAndTable(databaseNameOp, tableName)
+    val inputColumn = tableSchema.find(_.name.equalsIgnoreCase(inputFields.get(0)))
+    if (!inputColumn.isDefined) {
+      throw new MalformedCarbonCommandException(
+        s"${inputFields.get(0)} not present in schema of table: $tableName")
+    }
+    CarbonDescribeColumnCommand(
+      carbonTable,
+      inputFields,
+      inputColumn.get
+    )
+  }
+
+  def describeShort(
+      databaseNameOp: Option[String],
+      tableName: String
+  ): CarbonDescribeShortCommand = {
+    val (tableSchema, carbonTable) = getSchemaAndTable(databaseNameOp, tableName)
+    CarbonDescribeShortCommand(
+      carbonTable,
+      tableSchema
+    )
+  }
+
+  def getSchemaAndTable(databaseNameOp: Option[String],
+      tableName: String) : (StructType, CarbonTable) = {
+    val sparkSession = SparkSQLUtil.getSparkSession
+    validateTableExists(databaseNameOp, tableName, sparkSession)

Review comment:
       Done




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4113: [CARBONDATA-4161] Describe complex columns

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4113:
URL: https://github.com/apache/carbondata/pull/4113#discussion_r613122469



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala
##########
@@ -370,3 +374,147 @@ private[sql] case class CarbonDescribeFormattedCommand(
 
   override protected def opName: String = "DESC FORMATTED"
 }
+
+case class CarbonDescribeColumnCommand(
+    databaseNameOp: Option[String],
+    tableName: String,
+    inputFieldNames: java.util.List[String])
+  extends MetadataCommand {
+
+  override val output: Seq[Attribute] = Seq(
+    // Column names are based on Hive.
+    AttributeReference("col_name", StringType, nullable = false,
+      new MetadataBuilder().putString("comment", "name of the column").build())(),
+    AttributeReference("data_type", StringType, nullable = false,
+      new MetadataBuilder().putString("comment", "data type of the column").build())(),
+    AttributeReference("comment", StringType, nullable = true,
+      new MetadataBuilder().putString("comment", "comment of the column").build())()
+  )
+
+  override def processMetadata(sparkSession: SparkSession): Seq[Row] = {
+    val (tableSchema, carbonTable) = Checker.getSchemaAndTable(databaseNameOp, tableName)
+    val inputField = tableSchema.find(_.name.equalsIgnoreCase(inputFieldNames.get(0)))
+    if (inputField.isEmpty) {
+      throw new MalformedCarbonCommandException(
+        s"Column ${ inputFieldNames.get(0) } does not exists in the table " +
+        s"${ carbonTable.getDatabaseName }.$tableName")
+    }
+    setAuditTable(carbonTable)
+    var results = Seq[(String, String, String)]()
+    var currField = inputField.get
+    val inputFieldsIterator = inputFieldNames.iterator()
+    var inputColumn = inputFieldsIterator.next()
+    while (results.isEmpty) {
+      breakable {
+        if (currField.dataType.typeName.equalsIgnoreCase(CarbonCommonConstants.ARRAY)) {

Review comment:
       It works for all columns. If we give describe primitive column, it simply displays column name, datatype, comment details. I have added one such query in the test case.




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4113: [CARBONDATA-4161] Describe complex columns

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4113:
URL: https://github.com/apache/carbondata/pull/4113#discussion_r613149270



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala
##########
@@ -370,3 +374,147 @@ private[sql] case class CarbonDescribeFormattedCommand(
 
   override protected def opName: String = "DESC FORMATTED"
 }
+
+case class CarbonDescribeColumnCommand(
+    databaseNameOp: Option[String],
+    tableName: String,
+    inputFieldNames: java.util.List[String])
+  extends MetadataCommand {
+
+  override val output: Seq[Attribute] = Seq(
+    // Column names are based on Hive.
+    AttributeReference("col_name", StringType, nullable = false,
+      new MetadataBuilder().putString("comment", "name of the column").build())(),
+    AttributeReference("data_type", StringType, nullable = false,
+      new MetadataBuilder().putString("comment", "data type of the column").build())(),
+    AttributeReference("comment", StringType, nullable = true,
+      new MetadataBuilder().putString("comment", "comment of the column").build())()
+  )
+
+  override def processMetadata(sparkSession: SparkSession): Seq[Row] = {
+    val (tableSchema, carbonTable) = Checker.getSchemaAndTable(databaseNameOp, tableName)
+    val inputField = tableSchema.find(_.name.equalsIgnoreCase(inputFieldNames.get(0)))
+    if (inputField.isEmpty) {
+      throw new MalformedCarbonCommandException(
+        s"Column ${ inputFieldNames.get(0) } does not exists in the table " +
+        s"${ carbonTable.getDatabaseName }.$tableName")
+    }
+    setAuditTable(carbonTable)
+    var results = Seq[(String, String, String)]()
+    var currField = inputField.get
+    val inputFieldsIterator = inputFieldNames.iterator()
+    var inputColumn = inputFieldsIterator.next()
+    while (results.isEmpty) {
+      breakable {
+        if (currField.dataType.typeName.equalsIgnoreCase(CarbonCommonConstants.ARRAY)) {
+          if (inputFieldsIterator.hasNext) {
+            val nextField = inputFieldsIterator.next()
+            // child of an array can be only 'item'
+            if (!nextField.equalsIgnoreCase("item")) {
+              throw handleException(nextField, currField.name, carbonTable.getTableName)
+            }
+            // make the child type as current field to describe further nested types.
+            currField = StructField("item", currField.dataType.asInstanceOf[ArrayType].elementType)
+            inputColumn += "." + currField.name
+            break()
+          }
+          // if no further nested column given, display field information and its children.
+          results = Seq((inputColumn,
+            currField.dataType.typeName, currField.getComment().getOrElse("null")),
+            ("## Children of " + inputColumn + ":  ", "", ""))
+          results ++= Seq(("item", currField.dataType.asInstanceOf[ArrayType]
+            .elementType.simpleString, "null"))
+        }
+        else if (currField.dataType.typeName.equalsIgnoreCase(CarbonCommonConstants.STRUCT)) {
+          if (inputFieldsIterator.hasNext) {
+            val nextField = inputFieldsIterator.next()
+            val nextCurrField = currField.dataType.asInstanceOf[StructType].fields
+              .find(_.name.equalsIgnoreCase(nextField))
+            // verify if the input child name exists in the schema
+            if (!nextCurrField.isDefined) {
+              throw handleException(nextField, currField.name, carbonTable.getTableName)
+            }
+            // make the child type as current field to describe further nested types.
+            currField = nextCurrField.get
+            inputColumn += "." + currField.name
+            break()
+          }
+          // if no further nested column given, display field information and its children.
+          results = Seq((inputColumn,
+            currField.dataType.typeName, currField.getComment().getOrElse("null")),
+            ("## Children of " + inputColumn + ":  ", "", ""))
+          results ++= currField.dataType.asInstanceOf[StructType].fields.map(child => {
+            (child.name, child.dataType.simpleString, "null")
+          })
+        } else if (currField.dataType.typeName.equalsIgnoreCase(CarbonCommonConstants.MAP)) {
+          val children = currField.dataType.asInstanceOf[MapType]
+          if (inputFieldsIterator.hasNext) {
+            val nextField = inputFieldsIterator.next().toLowerCase()
+            // children of map can be only 'key' and 'value'
+            val nextCurrField = nextField match {
+              case "key" => StructField("key", children.keyType)
+              case "value" => StructField("value", children.valueType)
+              case _ => throw handleException(nextField, currField.name, carbonTable.getTableName)
+            }
+            // make the child type as current field to describe further nested types.
+            currField = nextCurrField
+            inputColumn += "." + currField.name
+            break()
+          }
+          // if no further nested column given, display field information and its children.
+          results = Seq((inputColumn,
+            currField.dataType.typeName, currField.getComment().getOrElse("null")),
+            ("## Children of " + inputColumn + ":  ", "", ""))
+          results ++= Seq(("key", children.keyType.simpleString, "null"),
+            ("value", children.valueType.simpleString, "null"))
+        } else {
+          results = Seq((inputColumn,
+            currField.dataType.typeName, currField.getComment().getOrElse("null")))
+        }
+      }
+    }
+    results.map { case (c1, c2, c3) => Row(c1, c2, c3) }
+  }
+
+  def handleException(nextField: String, currField: String, tableName: String): Throwable = {

Review comment:
       testcase present in -` test describe with invalid table and field names`




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4113: [CARBONDATA-4161] Describe complex columns

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4113:
URL: https://github.com/apache/carbondata/pull/4113#discussion_r613171410



##########
File path: docs/ddl-of-carbondata.md
##########
@@ -646,6 +647,69 @@ CarbonData DDL statements are documented here,which includes:
 
 ## TABLE MANAGEMENT  
 
+### DESCRIBE COMMAND
+
+Describe column of table and visualize its structure with child fields.
+  ```
+  [DESCRIBE | DESC] COLUMN fieldname[.nestedFieldNames] ON [db_name.]table_name;
+
+  Examples:
+  
+    DESCRIBE COLUMN locationinfo on complexcarbontable;
+    
+    +------------------------------+--------------------------------------------------------------------------------------------------------------------+-------+
+    |col_name                      |data_type                                                                                                           |comment|
+    +------------------------------+--------------------------------------------------------------------------------------------------------------------+-------+
+    |locationinfo                  |array                                                                                                               |null   |
+    |## Children of locationinfo:  |                                                                                                                    |       |
+    |item                          |struct<activeprovince:string,activecity:map<string,string>,activedistrict:string,activestreet:array<string>>        |null   |
+    +------------------------------+--------------------------------------------------------------------------------------------------------------------+-------+
+    
+    DESCRIBE COLUMN locationinfo.item on complexcarbontable
+    
+    +-----------------------------------+------------------+-------+
+    |col_name                           |data_type         |comment|
+    +-----------------------------------+------------------+-------+
+    |locationinfo.item                  |struct            |null   |

Review comment:
       updated the example where the comment is not null.




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4113: [CARBONDATA-4161] Describe complex columns

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4113:
URL: https://github.com/apache/carbondata/pull/4113#issuecomment-819508770


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5178/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4113: [CARBONDATA-4161] Describe complex columns

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4113:
URL: https://github.com/apache/carbondata/pull/4113#issuecomment-819511591


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3426/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on pull request #4113: [CARBONDATA-4161] Describe complex columns

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on pull request #4113:
URL: https://github.com/apache/carbondata/pull/4113#issuecomment-819540773


   retest this please


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] maheshrajus commented on pull request #4113: [CARBONDATA-4161] Describe complex columns

GitBox
In reply to this post by GitBox

maheshrajus commented on pull request #4113:
URL: https://github.com/apache/carbondata/pull/4113#issuecomment-819578452


   LGTM


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


123