[GitHub] carbondata pull request #2261: [CARBONDATA-2430][SDK] Reshuffling of Columns...

classic Classic list List threaded Threaded
67 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2261: [CARBONDATA-2430][SDK] Reshuffling of Columns given ...

qiuchenjian-2
Github user sounakr commented on the issue:

    https://github.com/apache/carbondata/pull/2261
 
    Retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2261: [CARBONDATA-2430][SDK] Reshuffling of Columns given ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user sounakr commented on the issue:

    https://github.com/apache/carbondata/pull/2261
 
    Retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2261: [CARBONDATA-2430][SDK] Reshuffling of Columns given ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2261
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4477/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2261: [CARBONDATA-2430][SDK] Reshuffling of Columns given ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2261
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4480/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2261: [CARBONDATA-2430][SDK] Reshuffling of Columns given ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2261
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5640/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2261: [CARBONDATA-2430][SDK] Reshuffling of Columns...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2261#discussion_r186041329
 
    --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ---
    @@ -416,16 +411,58 @@ private CarbonTable buildCarbonTable() {
         }
         TableSchema schema = tableSchemaBuilder.build();
         schema.setTableName(tableName);
    -    CarbonTable table = CarbonTable.builder()
    -        .tableName(schema.getTableName())
    -        .databaseName(dbName)
    -        .tablePath(path)
    -        .tableSchema(schema)
    -        .isTransactionalTable(isTransactionalTable)
    -        .build();
    +    CarbonTable table =
    +        CarbonTable.builder().tableName(schema.getTableName()).databaseName(dbName).tablePath(path)
    +            .tableSchema(schema).isTransactionalTable(isTransactionalTable).build();
         return table;
       }
     
    +  private void buildTableSchema(Field[] fields, TableSchemaBuilder tableSchemaBuilder,
    +      List<String> sortColumnsList, ColumnSchema[] sortColumnsSchemaList) {
    +    for (Field field : fields) {
    +      if (null != field) {
    +        int isSortColumn = sortColumnsList.indexOf(field.getFieldName());
    +        if (isSortColumn > -1) {
    +          // unsupported types for ("array", "struct", "double", "float", "decimal")
    +          if (field.getDataType() == DataTypes.DOUBLE || field.getDataType() == DataTypes.FLOAT
    +              || DataTypes.isDecimal(field.getDataType()) || DataTypes
    +              .isArrayType(field.getDataType()) || DataTypes.isStructType(field.getDataType())) {
    +            throw new RuntimeException(
    +                " sort columns not supported for " + "array, struct, double, float, decimal ");
    +          }
    +        }
    +
    +        if (field.getChildren() != null && field.getChildren().size() > 0) {
    +          if (field.getDataType().getName().equalsIgnoreCase("ARRAY")) {
    +            // Loop through the inner columns and for a StructData
    +            DataType complexType =
    +                DataTypes.createArrayType(field.getChildren().get(0).getDataType());
    +            tableSchemaBuilder.addColumn(new StructField(field.getFieldName(), complexType), false);
    +          } else if (field.getDataType().getName().equalsIgnoreCase("STRUCT")) {
    +            // Loop through the inner columns and for a StructData
    +            List<StructField> structFieldsArray =
    +                new ArrayList<StructField>(field.getChildren().size());
    +            for (StructField childFld : field.getChildren()) {
    +              structFieldsArray
    +                  .add(new StructField(childFld.getFieldName(), childFld.getDataType()));
    +            }
    +            DataType complexType = DataTypes.createStructType(structFieldsArray);
    +            tableSchemaBuilder.addColumn(new StructField(field.getFieldName(), complexType), false);
    +          }
    +        } else {
    +
    --- End diff --
   
    remove empty lines


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2261: [CARBONDATA-2430][SDK] Reshuffling of Columns given ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2261
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4719/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2261: [CARBONDATA-2430][SDK] Reshuffling of Columns...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2261#discussion_r186043853
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java ---
    @@ -108,21 +115,36 @@ public void setSortColumns(List<ColumnSchema> sortColumns) {
       }
     
       public ColumnSchema addColumn(StructField field, boolean isSortColumn) {
    +    return addColumn(field, null, isSortColumn, false);
    +  }
    +
    +  private ColumnSchema addColumn(StructField field, String parentName, boolean isSortColumn,
    +      boolean isComplexChild) {
         Objects.requireNonNull(field);
         checkRepeatColumnName(field);
         ColumnSchema newColumn = new ColumnSchema();
    -    newColumn.setColumnName(field.getFieldName());
    +    if (parentName != null) {
    +      newColumn.setColumnName(parentName + "." + field.getFieldName());
    +    } else {
    +      newColumn.setColumnName(field.getFieldName());
    +    }
         newColumn.setDataType(field.getDataType());
         if (isSortColumn ||
             field.getDataType() == DataTypes.STRING ||
    --- End diff --
   
    Instead of adding so many conditions in if add if(!(field.getDataType()==DataTypes.Double || field.getDataType()==DataTypes.BigDecimal)). Dimension column can be long, short, int, currently it is not handled


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2261: [CARBONDATA-2430][SDK] Reshuffling of Columns...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2261#discussion_r186049025
 
    --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ---
    @@ -367,9 +384,8 @@ private CarbonTable buildCarbonTable() {
           //  user passed size 4 but supplied only 2 fileds
           for (Field field : schema.getFields()) {
             if (null != field) {
    -          if (field.getDataType() == DataTypes.STRING ||
    -              field.getDataType() == DataTypes.DATE ||
    -              field.getDataType() == DataTypes.TIMESTAMP) {
    +          if (field.getDataType() == DataTypes.STRING || field.getDataType() == DataTypes.DATE
    --- End diff --
   
    Sort column is also supported for long, short, int data type , add a reverse check if data type is not of  double, bigdecimal, struct, array then add to sort column


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2261: [CARBONDATA-2430][SDK] Reshuffling of Columns...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2261#discussion_r186053512
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java ---
    @@ -135,25 +157,41 @@ public ColumnSchema addColumn(StructField field, boolean isSortColumn) {
         newColumn.setColumnReferenceId(newColumn.getColumnUniqueId());
         newColumn.setEncodingList(createEncoding(field.getDataType(), isSortColumn));
         if (field.getDataType().isComplexType()) {
    -      newColumn.setNumberOfChild(((StructType) field.getDataType()).getFields().size());
    +      if (field.getDataType().getName().equalsIgnoreCase("ARRAY")) {
    +        newColumn.setNumberOfChild(1);
    +      } else {
    +        newColumn.setNumberOfChild(((StructType) field.getDataType()).getFields().size());
    +      }
         }
         if (DataTypes.isDecimal(field.getDataType())) {
           DecimalType decimalType = (DecimalType) field.getDataType();
           newColumn.setPrecision(decimalType.getPrecision());
           newColumn.setScale(decimalType.getScale());
         }
         if (!isSortColumn) {
    -      otherColumns.add(newColumn);
    +      if (!newColumn.isDimensionColumn()) {
    +        measures.add(newColumn);
    +      } else if (DataTypes.isStructType(field.getDataType()) ||
    +          DataTypes.isArrayType(field.getDataType()) || isComplexChild) {
    --- End diff --
   
    Instead of DataTypes.isStructType(field.getDataType()) || DataTypes.isArrayType(field.getDataType()) use field.getDataType().isComplexType()


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2261: [CARBONDATA-2430][SDK] Reshuffling of Columns...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2261#discussion_r186055072
 
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/DataFrameComplexTypeExample.scala ---
    @@ -56,6 +72,38 @@ object DataFrameComplexTypeExample {
              | 'dictionary_include'='city')
              | """.stripMargin)
     
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE ${ complexTypeNoDictionaryTableNameArray }(
    +         | id INT,
    +         | name STRING,
    +         | city STRING,
    +         | salary FLOAT,
    +         | file array<string>
    +         | )
    +         | STORED BY 'carbondata'
    +         | TBLPROPERTIES(
    +         | 'sort_columns'='name',
    +         | 'dictionary_include'='city')
    +         | """.stripMargin)
    +
    +
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE ${ complexTypeNoDictionaryTableName }(
    +         | id INT,
    +         | name STRING,
    +         | city STRING,
    +         | salary FLOAT,
    +         | file struct<school:array<string>, school1:array<string>, age:int>
    +         | )
    +         | STORED BY 'carbondata'
    +         | TBLPROPERTIES(
    +         | 'sort_columns'='name',
    +         | 'dictionary_exclude'='val')
    --- End diff --
   
    Need to fix this example class after PR#2266 is merged. It will make complex columns as dictionary_exclude by default.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2261: [CARBONDATA-2430][SDK] Reshuffling of Columns...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2261#discussion_r186055186
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala ---
    @@ -750,7 +750,93 @@ class TestNonTransactionalCarbonTable extends QueryTest with BeforeAndAfterAll {
         buildAvroTestData(3, null)
       }
     
    -  test("Read sdk writer Avro output ") {
    +  def buildAvroTestDataArrayType(rows: Int, options: util.Map[String, String]): Any = {
    +    FileUtils.deleteDirectory(new File(writerPath))
    +    /**
    +     * *
    +     * {
    +     * "name": "address",
    +     * "type": "record",
    +     * "fields": [
    +     * {
    +     * "name": "name",
    +     * "type": "string"
    +     * },
    +     * {
    +     * "name": "age",
    +     * "type": "int"
    +     * },
    +     * {
    +     * "name": "address",
    +     * "type": {
    +     * "type": "array",
    +     * "items": {
    +     * "name": "street",
    +     * "type": "string"
    +     * }
    +     * }
    +     * }
    +     * ]
    +     * }
    +     **/
    +    val mySchema = "{\n" + "\t\"name\": \"address\",\n" + "\t\"type\": \"record\",\n" +
    +                   "\t\"fields\": [\n" + "\t\t{\n" + "\t\t\t\"name\": \"name\",\n" +
    +                   "\t\t\t\"type\": \"string\"\n" + "\t\t},\n" + "\t\t{\n" +
    +                   "\t\t\t\"name\": \"age\",\n" + "\t\t\t\"type\": \"int\"\n" + "\t\t},\n" +
    +                   "\t\t{\n" + "\t\t\t\"name\": \"address\",\n" + "\t\t\t\"type\": {\n" +
    +                   "\t\t\t\t\"type\": \"array\",\n" + "\t\t\t\t\"items\": {\n" +
    +                   "\t\t\t\t\t\"name\": \"street\",\n" +
    +                   "\t\t\t\t\t\"type\": \"string\"\n" + "\t\t\t\t}\n" + "\t\t\t}\n" +
    +                   "\t\t}\n" + "\t]\n" + "}"
    +    /**
    +     * {
    +     * "name": "bob",
    +     * "age": 10,
    +     * "address": [
    +     * "abc", "def"
    +     * ]
    +     * }
    +     **/
    +    val json: String = "{\n" + "\t\"name\": \"bob\",\n" + "\t\"age\": 10,\n" +
    +                       "\t\"address\": [\n" + "\t\t\"abc\", \"defg\"\n" + "\t]\n" + "}"
    +    // conversion to GenericData.Record
    +    val nn = new org.apache.avro.Schema.Parser().parse(mySchema)
    +    val converter = new JsonAvroConverter
    +    val record = converter
    +      .convertToGenericDataRecord(json.getBytes(CharEncoding.UTF_8), nn)
    +    val fields = new Array[Field](3)
    +    fields(0) = new Field("name", DataTypes.STRING)
    +    fields(1) = new Field("age", DataTypes.INT)
    +    // fields[1] = new Field("age", DataTypes.INT);
    --- End diff --
   
    remove commented code


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2261: [CARBONDATA-2430][SDK] Reshuffling of Columns...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2261#discussion_r186055265
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala ---
    @@ -750,7 +750,93 @@ class TestNonTransactionalCarbonTable extends QueryTest with BeforeAndAfterAll {
         buildAvroTestData(3, null)
       }
     
    -  test("Read sdk writer Avro output ") {
    +  def buildAvroTestDataArrayType(rows: Int, options: util.Map[String, String]): Any = {
    +    FileUtils.deleteDirectory(new File(writerPath))
    +    /**
    +     * *
    +     * {
    +     * "name": "address",
    +     * "type": "record",
    +     * "fields": [
    +     * {
    +     * "name": "name",
    +     * "type": "string"
    +     * },
    +     * {
    +     * "name": "age",
    +     * "type": "int"
    +     * },
    +     * {
    +     * "name": "address",
    +     * "type": {
    +     * "type": "array",
    +     * "items": {
    +     * "name": "street",
    +     * "type": "string"
    +     * }
    +     * }
    +     * }
    +     * ]
    +     * }
    +     **/
    +    val mySchema = "{\n" + "\t\"name\": \"address\",\n" + "\t\"type\": \"record\",\n" +
    +                   "\t\"fields\": [\n" + "\t\t{\n" + "\t\t\t\"name\": \"name\",\n" +
    +                   "\t\t\t\"type\": \"string\"\n" + "\t\t},\n" + "\t\t{\n" +
    +                   "\t\t\t\"name\": \"age\",\n" + "\t\t\t\"type\": \"int\"\n" + "\t\t},\n" +
    +                   "\t\t{\n" + "\t\t\t\"name\": \"address\",\n" + "\t\t\t\"type\": {\n" +
    +                   "\t\t\t\t\"type\": \"array\",\n" + "\t\t\t\t\"items\": {\n" +
    +                   "\t\t\t\t\t\"name\": \"street\",\n" +
    +                   "\t\t\t\t\t\"type\": \"string\"\n" + "\t\t\t\t}\n" + "\t\t\t}\n" +
    +                   "\t\t}\n" + "\t]\n" + "}"
    +    /**
    +     * {
    +     * "name": "bob",
    +     * "age": 10,
    +     * "address": [
    +     * "abc", "def"
    +     * ]
    +     * }
    +     **/
    +    val json: String = "{\n" + "\t\"name\": \"bob\",\n" + "\t\"age\": 10,\n" +
    +                       "\t\"address\": [\n" + "\t\t\"abc\", \"defg\"\n" + "\t]\n" + "}"
    +    // conversion to GenericData.Record
    +    val nn = new org.apache.avro.Schema.Parser().parse(mySchema)
    +    val converter = new JsonAvroConverter
    +    val record = converter
    +      .convertToGenericDataRecord(json.getBytes(CharEncoding.UTF_8), nn)
    +    val fields = new Array[Field](3)
    +    fields(0) = new Field("name", DataTypes.STRING)
    +    fields(1) = new Field("age", DataTypes.INT)
    +    // fields[1] = new Field("age", DataTypes.INT);
    +    val fld = new util.ArrayList[StructField]
    +    fld.add(new StructField("street", DataTypes.STRING))
    +    fld.add(new StructField("city", DataTypes.STRING))
    +    fields(2) = new Field("address", "struct", fld)
    +    try {
    --- End diff --
   
    use intercept[Exception] instead of try catch


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2261: [CARBONDATA-2430][SDK] Reshuffling of Columns...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user sounakr commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2261#discussion_r186055820
 
    --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ---
    @@ -367,9 +384,8 @@ private CarbonTable buildCarbonTable() {
           //  user passed size 4 but supplied only 2 fileds
           for (Field field : schema.getFields()) {
             if (null != field) {
    -          if (field.getDataType() == DataTypes.STRING ||
    -              field.getDataType() == DataTypes.DATE ||
    -              field.getDataType() == DataTypes.TIMESTAMP) {
    +          if (field.getDataType() == DataTypes.STRING || field.getDataType() == DataTypes.DATE
    --- End diff --
   
    This check is for creating all the dimension columns as sort columns when by default sort column is not specified.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2261: [CARBONDATA-2430][SDK] Reshuffling of Columns...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user sounakr commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2261#discussion_r186055858
 
    --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ---
    @@ -416,16 +411,58 @@ private CarbonTable buildCarbonTable() {
         }
         TableSchema schema = tableSchemaBuilder.build();
         schema.setTableName(tableName);
    -    CarbonTable table = CarbonTable.builder()
    -        .tableName(schema.getTableName())
    -        .databaseName(dbName)
    -        .tablePath(path)
    -        .tableSchema(schema)
    -        .isTransactionalTable(isTransactionalTable)
    -        .build();
    +    CarbonTable table =
    +        CarbonTable.builder().tableName(schema.getTableName()).databaseName(dbName).tablePath(path)
    +            .tableSchema(schema).isTransactionalTable(isTransactionalTable).build();
         return table;
       }
     
    +  private void buildTableSchema(Field[] fields, TableSchemaBuilder tableSchemaBuilder,
    +      List<String> sortColumnsList, ColumnSchema[] sortColumnsSchemaList) {
    +    for (Field field : fields) {
    +      if (null != field) {
    +        int isSortColumn = sortColumnsList.indexOf(field.getFieldName());
    +        if (isSortColumn > -1) {
    +          // unsupported types for ("array", "struct", "double", "float", "decimal")
    +          if (field.getDataType() == DataTypes.DOUBLE || field.getDataType() == DataTypes.FLOAT
    +              || DataTypes.isDecimal(field.getDataType()) || DataTypes
    +              .isArrayType(field.getDataType()) || DataTypes.isStructType(field.getDataType())) {
    +            throw new RuntimeException(
    +                " sort columns not supported for " + "array, struct, double, float, decimal ");
    +          }
    +        }
    +
    +        if (field.getChildren() != null && field.getChildren().size() > 0) {
    +          if (field.getDataType().getName().equalsIgnoreCase("ARRAY")) {
    +            // Loop through the inner columns and for a StructData
    +            DataType complexType =
    +                DataTypes.createArrayType(field.getChildren().get(0).getDataType());
    +            tableSchemaBuilder.addColumn(new StructField(field.getFieldName(), complexType), false);
    +          } else if (field.getDataType().getName().equalsIgnoreCase("STRUCT")) {
    +            // Loop through the inner columns and for a StructData
    +            List<StructField> structFieldsArray =
    +                new ArrayList<StructField>(field.getChildren().size());
    +            for (StructField childFld : field.getChildren()) {
    +              structFieldsArray
    +                  .add(new StructField(childFld.getFieldName(), childFld.getDataType()));
    +            }
    +            DataType complexType = DataTypes.createStructType(structFieldsArray);
    +            tableSchemaBuilder.addColumn(new StructField(field.getFieldName(), complexType), false);
    +          }
    +        } else {
    +
    --- End diff --
   
    Done


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2261: [CARBONDATA-2430][SDK] Reshuffling of Columns...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2261#discussion_r186055914
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala ---
    @@ -761,9 +847,29 @@ class TestNonTransactionalCarbonTable extends QueryTest with BeforeAndAfterAll {
         sql("select * from sdkOutputTable").show(false)
     
         checkAnswer(sql("select * from sdkOutputTable"), Seq(
    -      Row("bob", "10", Row("abc","bang")),
    -      Row("bob", "10", Row("abc","bang")),
    -      Row("bob", "10", Row("abc","bang"))))
    +      Row("bob", 10, Row("abc","bang")),
    +      Row("bob", 10, Row("abc","bang")),
    +      Row("bob", 10, Row("abc","bang"))))
    +
    +    sql("DROP TABLE sdkOutputTable")
    +    // drop table should not delete the files
    +    assert(new File(writerPath).exists())
    --- End diff --
   
    check for file existence inside writerPath instead of the folder


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2261: [CARBONDATA-2430][SDK] Reshuffling of Columns...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2261#discussion_r186056119
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala ---
    @@ -604,31 +604,58 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser {
             tableProperties.get(CarbonCommonConstants.DICTIONARY_EXCLUDE).get.split(',').map(_.trim)
           dictExcludeCols
             .foreach { dictExcludeCol =>
    -          if (!fields.exists(x => x.column.equalsIgnoreCase(dictExcludeCol))) {
    +          if (!checkFields(fields, dictExcludeCol)) {
                 val errormsg = "DICTIONARY_EXCLUDE column: " + dictExcludeCol +
                                " does not exist in table. Please check create table statement."
                 throw new MalformedCarbonCommandException(errormsg)
               } else {
    -            val dataType = fields.find(x =>
    -              x.column.equalsIgnoreCase(dictExcludeCol)).get.dataType.get
    -            if (isComplexDimDictionaryExclude(dataType)) {
    -              val errormsg = "DICTIONARY_EXCLUDE is unsupported for complex datatype column: " +
    -                             dictExcludeCol
    -              throw new MalformedCarbonCommandException(errormsg)
    -            } else if (!isDataTypeSupportedForDictionary_Exclude(dataType)) {
    +            val dataType = findField(fields, dictExcludeCol).get.dataType.get
    +            if (!isDataTypeSupportedForDictionary_Exclude(dataType)) {
                   val errorMsg = "DICTIONARY_EXCLUDE is unsupported for " + dataType.toLowerCase() +
                                  " data type column: " + dictExcludeCol
                   throw new MalformedCarbonCommandException(errorMsg)
                 }
               }
             }
         }
    +
    +
    +    def checkFields(y: Seq[Field], colToMatch: String): Boolean = {
    +      y.exists { fld =>
    +        if (fld.column.equalsIgnoreCase(colToMatch)) {
    +          true
    +        } else if (fld.children.isDefined && fld.children.get != null) {
    +          checkFields(fld.children.get, colToMatch)
    +        } else {
    +          false
    +        }
    +      }
    +    }
    +
    +    def findField(y: Seq[Field], colToMatch: String): Option[Field] = {
    --- End diff --
   
    Write description for checkFields and findFields


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2261: [CARBONDATA-2430][SDK] Reshuffling of Columns...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2261#discussion_r186056900
 
    --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ---
    @@ -416,16 +411,58 @@ private CarbonTable buildCarbonTable() {
         }
         TableSchema schema = tableSchemaBuilder.build();
         schema.setTableName(tableName);
    -    CarbonTable table = CarbonTable.builder()
    -        .tableName(schema.getTableName())
    -        .databaseName(dbName)
    -        .tablePath(path)
    -        .tableSchema(schema)
    -        .isTransactionalTable(isTransactionalTable)
    -        .build();
    +    CarbonTable table =
    +        CarbonTable.builder().tableName(schema.getTableName()).databaseName(dbName).tablePath(path)
    +            .tableSchema(schema).isTransactionalTable(isTransactionalTable).build();
         return table;
       }
     
    +  private void buildTableSchema(Field[] fields, TableSchemaBuilder tableSchemaBuilder,
    +      List<String> sortColumnsList, ColumnSchema[] sortColumnsSchemaList) {
    +    for (Field field : fields) {
    +      if (null != field) {
    +        int isSortColumn = sortColumnsList.indexOf(field.getFieldName());
    +        if (isSortColumn > -1) {
    +          // unsupported types for ("array", "struct", "double", "float", "decimal")
    +          if (field.getDataType() == DataTypes.DOUBLE || field.getDataType() == DataTypes.FLOAT
    +              || DataTypes.isDecimal(field.getDataType()) || DataTypes
    +              .isArrayType(field.getDataType()) || DataTypes.isStructType(field.getDataType())) {
    --- End diff --
   
    use field.getDataType().isComplexType() to check for complex types


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2261: [CARBONDATA-2430][SDK] Reshuffling of Columns...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user sounakr commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2261#discussion_r186056928
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java ---
    @@ -108,21 +115,36 @@ public void setSortColumns(List<ColumnSchema> sortColumns) {
       }
     
       public ColumnSchema addColumn(StructField field, boolean isSortColumn) {
    +    return addColumn(field, null, isSortColumn, false);
    +  }
    +
    +  private ColumnSchema addColumn(StructField field, String parentName, boolean isSortColumn,
    +      boolean isComplexChild) {
         Objects.requireNonNull(field);
         checkRepeatColumnName(field);
         ColumnSchema newColumn = new ColumnSchema();
    -    newColumn.setColumnName(field.getFieldName());
    +    if (parentName != null) {
    +      newColumn.setColumnName(parentName + "." + field.getFieldName());
    +    } else {
    +      newColumn.setColumnName(field.getFieldName());
    +    }
         newColumn.setDataType(field.getDataType());
         if (isSortColumn ||
             field.getDataType() == DataTypes.STRING ||
    --- End diff --
   
    In case if i have to place a negative condition then we have to add
    Non Dictionary
    -------------------
    Boolean
    short
    int
    float
    long
    double
    Null
    Byte
   
   
    Dictionary
    -------------
    string
    array
    struct
    timestamp
    date
    char
   
   
    AS dictionary check is smaller keeping it as it is.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2261: [CARBONDATA-2430][SDK] Reshuffling of Columns...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user sounakr commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2261#discussion_r186057583
 
    --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ---
    @@ -416,16 +411,58 @@ private CarbonTable buildCarbonTable() {
         }
         TableSchema schema = tableSchemaBuilder.build();
         schema.setTableName(tableName);
    -    CarbonTable table = CarbonTable.builder()
    -        .tableName(schema.getTableName())
    -        .databaseName(dbName)
    -        .tablePath(path)
    -        .tableSchema(schema)
    -        .isTransactionalTable(isTransactionalTable)
    -        .build();
    +    CarbonTable table =
    +        CarbonTable.builder().tableName(schema.getTableName()).databaseName(dbName).tablePath(path)
    +            .tableSchema(schema).isTransactionalTable(isTransactionalTable).build();
         return table;
       }
     
    +  private void buildTableSchema(Field[] fields, TableSchemaBuilder tableSchemaBuilder,
    +      List<String> sortColumnsList, ColumnSchema[] sortColumnsSchemaList) {
    +    for (Field field : fields) {
    +      if (null != field) {
    +        int isSortColumn = sortColumnsList.indexOf(field.getFieldName());
    +        if (isSortColumn > -1) {
    +          // unsupported types for ("array", "struct", "double", "float", "decimal")
    +          if (field.getDataType() == DataTypes.DOUBLE || field.getDataType() == DataTypes.FLOAT
    +              || DataTypes.isDecimal(field.getDataType()) || DataTypes
    +              .isArrayType(field.getDataType()) || DataTypes.isStructType(field.getDataType())) {
    --- End diff --
   
    Done


---
1234