[GitHub] carbondata pull request #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dict...

classic Classic list List threaded Threaded
40 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dict...

qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2375#discussion_r196307346
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
    @@ -3000,5 +3019,95 @@ public static String getBlockId(AbsoluteTableIdentifier identifier, String fileP
         }
         return blockId;
       }
    +
    +  /**
    +   * sets the local dictionary columns to wrapper schema, if the table property local_dict_include
    +   * is defined, then those columns will be set as local dictionary columns, if not, all the no
    +   * dictionary string datatype columns are set as local dictionary columns.
    +   * Handling for complexTypes::
    +   *   Since the column structure will be flat
    +   *   if the parent column is configured as local Dictionary column, then it gets the child column
    +   *   count and then sets the primitive child column as local dictionary column if it is string
    +   *   datatype column
    +   * Handling for both localDictionary Include and exclude columns:
    +   * There will be basically four scenarios which are
    +   * -------------------------------------------------------
    +   * | Local_Dictionary_include | Local_Dictionary_Exclude |
    +   * -------------------------------------------------------
    +   * |   Not Defined            |     Not Defined          |
    +   * |   Not Defined            |      Defined             |
    +   * |   Defined                |     Not Defined          |
    +   * |   Defined                |      Defined             |
    +   * -------------------------------------------------------
    +   * 1. when the both local dictionary include and exclude is not defined, then set all the no
    +   * dictionary string datatype columns as local dictionary generate columns
    +   * 2. set all the no dictionary string datatype columns as local dictionary columns except the
    +   * columns present in local dictionary exclude
    +   * 3. & 4. when local dictionary include is defined, no need to check dictionary exclude columns
    +   * configured or not, we just need to set only the columns present in local dictionary include as
    +   * local dictionary columns
    +   * @param columns
    +   * @param mainTableProperties
    +   */
    +  public static void setLocalDictColumnsToWrapperSchema(List<ColumnSchema> columns,
    +      Map<String, String> mainTableProperties) {
    +    String isLocalDictEnabledForMainTable =
    +        mainTableProperties.get(CarbonCommonConstants.LOCAL_DICT_ENABLE);
    +    String localDictIncludeColumnsOfMainTable =
    +        mainTableProperties.get(CarbonCommonConstants.LOCAL_DICT_INCLUDE);
    +    String localDictExcludeColumnsOfMainTable =
    +        mainTableProperties.get(CarbonCommonConstants.LOCAL_DICT_EXCLUDE);
    +    if (null != isLocalDictEnabledForMainTable && Boolean
    +        .parseBoolean(isLocalDictEnabledForMainTable)) {
    +      int childColumnCount = 0;
    +      for (ColumnSchema column : columns) {
    +        // for complex type columns, user gives the parent column as local dictionary column and
    +        // only the string primitive type child column will be set as local dictionary column in the
    +        // schema
    +        if (childColumnCount > 0) {
    +          if (column.getDataType().equals(DataTypes.STRING)) {
    +            column.setLocalDictColumn(true);
    +            childColumnCount -= 1;
    +          } else {
    +            childColumnCount -= 1;
    +          }
    +        }
    +        // if complex column is defined in local dictionary include column, then get the child
    +        // columns and set the string datatype child type as local dictionary column
    +        if (column.getNumberOfChild() > 0 && null != localDictIncludeColumnsOfMainTable
    +            && localDictIncludeColumnsOfMainTable.toLowerCase()
    +            .contains(column.getColumnName().toLowerCase())) {
    +          childColumnCount = column.getNumberOfChild();
    +        }
    +        if (null == localDictIncludeColumnsOfMainTable) {
    +          // if local dictionary exclude columns is not defined, then set all the no dictionary
    +          // string datatype column
    +          if (null == localDictExcludeColumnsOfMainTable) {
    +            // column should be no dictionary string datatype column
    +            if (column.isDimensionColumn() && column.getDataType().equals(DataTypes.STRING)
    +                && !column.hasEncoding(Encoding.DICTIONARY)) {
    +              column.setLocalDictColumn(true);
    +            }
    +            // if local dictionary exclude columns is defined, then set for all no dictionary string
    +            // datatype columns except excluded columns
    +          } else {
    +            if (column.isDimensionColumn() && column.getDataType().equals(DataTypes.STRING)
    +                && !column.hasEncoding(Encoding.DICTIONARY) && !localDictExcludeColumnsOfMainTable
    +                .toLowerCase().contains(column.getColumnName().toLowerCase())) {
    +              column.setLocalDictColumn(true);
    +            }
    +          }
    +        } else {
    +          // if local dict columns alre not configured, set for all no dictionary string datatype
    +          // column
    +          if (column.isDimensionColumn() && column.getDataType().equals(DataTypes.STRING) && !column
    +              .hasEncoding(Encoding.DICTIONARY) && localDictIncludeColumnsOfMainTable.toLowerCase()
    --- End diff --
   
    same as above comment


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2375#discussion_r196308657
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala ---
    @@ -289,7 +290,79 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser {
           noDictionaryDims, msrs, dims)
         if (groupCols != null) {
           throw new MalformedCarbonCommandException(
    -        s"${CarbonCommonConstants.COLUMN_GROUPS} is deprecated")
    +        s"${ CarbonCommonConstants.COLUMN_GROUPS } is deprecated")
    +    }
    +
    +    // validate the local dictionary property if defined
    +    if (tableProperties.get(CarbonCommonConstants.LOCAL_DICT_ENABLE).isDefined) {
    +      // if any invalid value is configured for LOCAL_DICT_ENABLE, then default value will be
    +      // considered which is true
    +      if (!CarbonScalaUtil
    +        .castStringToBoolean(tableProperties(CarbonCommonConstants.LOCAL_DICT_ENABLE))) {
    --- End diff --
   
    You can directly use
    ```
    Try(tableProperties(CarbonCommonConstants.LOCAL_DICT_ENABLE).toBoolean) match {
      case Success(value) =>
     case Failure(e) =>
    }
    ```


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2375#discussion_r196308732
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala ---
    @@ -289,7 +290,79 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser {
           noDictionaryDims, msrs, dims)
         if (groupCols != null) {
           throw new MalformedCarbonCommandException(
    -        s"${CarbonCommonConstants.COLUMN_GROUPS} is deprecated")
    +        s"${ CarbonCommonConstants.COLUMN_GROUPS } is deprecated")
    +    }
    +
    +    // validate the local dictionary property if defined
    +    if (tableProperties.get(CarbonCommonConstants.LOCAL_DICT_ENABLE).isDefined) {
    +      // if any invalid value is configured for LOCAL_DICT_ENABLE, then default value will be
    +      // considered which is true
    +      if (!CarbonScalaUtil
    +        .castStringToBoolean(tableProperties(CarbonCommonConstants.LOCAL_DICT_ENABLE))) {
    +        LOGGER
    +          .debug(
    +            "invalid value is configured for local_dict_enable, considering the defaut value")
    +        tableProperties.put(CarbonCommonConstants.LOCAL_DICT_ENABLE,
    +          CarbonCommonConstants.LOCAL_DICT_ENABLE_DEFAULT)
    +      }
    +    } else {
    +      // if LOCAL_DICT_ENABLE is not defined, consider the default value which is true
    +      tableProperties.put(CarbonCommonConstants.LOCAL_DICT_ENABLE,
    +        CarbonCommonConstants.LOCAL_DICT_ENABLE_DEFAULT)
    +    }
    +
    +    // validate the local dictionary threshold property if defined
    +    if (tableProperties.get(CarbonCommonConstants.LOCAL_DICT_THRESHOLD).isDefined) {
    +      // if any invalid value is configured for LOCAL_DICT_THRESHOLD, then default value will be
    +      // considered which is 1000
    +      if (!CarbonScalaUtil.castStringToInt(tableProperties(CarbonCommonConstants
    +        .LOCAL_DICT_THRESHOLD)) ||
    --- End diff --
   
    same as above


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2375#discussion_r196309547
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala ---
    @@ -1044,13 +1177,13 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser {
        * Matching the decimal(10,0) data type and returning the same.
        */
       private lazy val decimalType =
    -  DECIMAL ~ (("(" ~> numericLit <~ ",") ~ (numericLit <~ ")")).? ^^ {
    -    case decimal ~ precisionAndScale => if (precisionAndScale.isDefined) {
    -      s"decimal(${ precisionAndScale.get._1 }, ${ precisionAndScale.get._2 })"
    -    } else {
    -      s"decimal(10,0)"
    +    DECIMAL ~ (("(" ~> numericLit <~ ",") ~ (numericLit <~ ")")).? ^^ {
    --- End diff --
   
    what is the change here?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2375#discussion_r196310318
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala ---
    @@ -1044,13 +1177,13 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser {
        * Matching the decimal(10,0) data type and returning the same.
        */
       private lazy val decimalType =
    -  DECIMAL ~ (("(" ~> numericLit <~ ",") ~ (numericLit <~ ")")).? ^^ {
    -    case decimal ~ precisionAndScale => if (precisionAndScale.isDefined) {
    -      s"decimal(${ precisionAndScale.get._1 }, ${ precisionAndScale.get._2 })"
    -    } else {
    -      s"decimal(10,0)"
    +    DECIMAL ~ (("(" ~> numericLit <~ ",") ~ (numericLit <~ ")")).? ^^ {
    --- End diff --
   
    these changes are done by mistake while formatting, i will revert these unwanted changes


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2375#discussion_r196365666
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java ---
    @@ -150,6 +150,16 @@
     
       private boolean hasDataMapSchema;
     
    +  /**
    +   * is local dictionary generation enabled for the table
    +   */
    +  private boolean isLocalDictionaryEnabled;
    --- End diff --
   
    in stead of reading from table properties again and again and then parsing and then checking for true or false, it is better to add in carbon table, which will be build one time, and it would be easy to know whether enabled and to find default values.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2375
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6377/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2375
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6378/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2375
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6379/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2375
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5214/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2375
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5325/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2375
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6382/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2375
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5217/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2375
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5326/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2375
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2375
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5327/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2375
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5328/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2375
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5329/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2375
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5330/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user akashrn5 closed the pull request at:

    https://github.com/apache/carbondata/pull/2375


---
12