[GitHub] carbondata pull request #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dict...

classic Classic list List threaded Threaded
48 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dict...

qiuchenjian-2
GitHub user akashrn5 opened a pull request:

    https://github.com/apache/carbondata/pull/2401

    [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]Local dictionary support for alter table, preaggregate and varchar datatype

    What changes were proposed in this pull request?
    In this PR,
    local dictionary support is added for alter table, preaggregate and varChar datatype
    All the validations related to above features are taken care in this PR
   
    How was this patch tested?
    UTs and SDV test cases are added in the same PR
   
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/akashrn5/incubator-carbondata local_dict_for_alter_preagg

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2401.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2401
   
----

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2401
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6481/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2401
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5312/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2401
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6482/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2401
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5313/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2401
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5403/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2401
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5404/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user akashrn5 commented on the issue:

    https://github.com/apache/carbondata/pull/2401
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2401
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6512/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2401
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5344/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2401
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6539/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2401
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5368/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2401
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5443/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2401
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6570/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2401
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5398/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2401#discussion_r198387868
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java ---
    @@ -1110,9 +1110,18 @@ private static void setLocalDictInfo(CarbonTable table, TableInfo tableInfo) {
         String localDictionaryThreshold = tableInfo.getFactTable().getTableProperties()
             .get(CarbonCommonConstants.LOCAL_DICTIONARY_THRESHOLD);
         if (null != isLocalDictionaryEnabled) {
    -      table.setLocalDictionaryEnabled(Boolean.parseBoolean(isLocalDictionaryEnabled));
    +      // if table is altered then, for same key in table properties, multiple values will be
    --- End diff --
   
    remove this code if unused


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2401#discussion_r198389740
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
    @@ -3067,44 +3067,56 @@ public static void setLocalDictColumnsToWrapperSchema(List<ColumnSchema> columns
             }
             // if complex column is defined in local dictionary include column, then get the child
             // columns and set the string datatype child type as local dictionary column
    -        if (column.getNumberOfChild() > 0 && null != localDictIncludeColumnsOfMainTable) {
    -          listOfDictionaryIncludeColumns = localDictIncludeColumnsOfMainTable.split(",");
    +        if (column.getNumberOfChild() > 0 && null != localDictIncludeColumns) {
    +          listOfDictionaryIncludeColumns = localDictIncludeColumns.trim().split("\\s*,\\s*");
    --- End diff --
   
    If possible move trim and split outside of loop as it output will be same and it is happening in loop which will impact the performance.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2401#discussion_r198436867
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java ---
    @@ -1110,9 +1110,18 @@ private static void setLocalDictInfo(CarbonTable table, TableInfo tableInfo) {
         String localDictionaryThreshold = tableInfo.getFactTable().getTableProperties()
             .get(CarbonCommonConstants.LOCAL_DICTIONARY_THRESHOLD);
         if (null != isLocalDictionaryEnabled) {
    -      table.setLocalDictionaryEnabled(Boolean.parseBoolean(isLocalDictionaryEnabled));
    +      // if table is altered then, for same key in table properties, multiple values will be
    +      // appended, so after splitting, get the first value for enable and threshold, and that will
    +      // be latest configurations for the table for further local dictionary generation, because for
    +      // alter table add operation, local dictionary enable and threshold configuration is not
    +      // allowed
    +      String[] isLocalDictEnabled = isLocalDictionaryEnabled.split(",");
    --- End diff --
   
    Please don't handle the default property here. Better handle in parser like don't add default values if the flow through alter


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2401#discussion_r198440267
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
    @@ -3041,24 +3041,30 @@ public static String getBlockId(AbsoluteTableIdentifier identifier, String fileP
        * @param mainTableProperties
        */
       public static void setLocalDictColumnsToWrapperSchema(List<ColumnSchema> columns,
    -      Map<String, String> mainTableProperties) {
    -    String isLocalDictEnabledForMainTable =
    -        mainTableProperties.get(CarbonCommonConstants.LOCAL_DICTIONARY_ENABLE);
    -    String localDictIncludeColumnsOfMainTable =
    -        mainTableProperties.get(CarbonCommonConstants.LOCAL_DICTIONARY_INCLUDE);
    -    String localDictExcludeColumnsOfMainTable =
    -        mainTableProperties.get(CarbonCommonConstants.LOCAL_DICTIONARY_EXCLUDE);
    +      Map<String, String> mainTableProperties, String isLocalDictEnabledForMainTable) {
         String[] listOfDictionaryIncludeColumns = null;
         String[] listOfDictionaryExcludeColumns = null;
    +    String localDictIncludeColumns =
    +        mainTableProperties.get(CarbonCommonConstants.LOCAL_DICTIONARY_INCLUDE);
    +    String localDictExcludeColumns =
    +        mainTableProperties.get(CarbonCommonConstants.LOCAL_DICTIONARY_EXCLUDE);
    +    if (null != localDictIncludeColumns) {
    +      listOfDictionaryIncludeColumns = localDictIncludeColumns.trim().split("\\s*,\\s*");
    --- End diff --
   
    what is this split required for?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2401#discussion_r198441426
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala ---
    @@ -615,4 +618,114 @@ object CarbonScalaUtil {
           sparkSession,
           schema)._1.asInstanceOf[Object]
       }
    +
    +  /**
    +   * this method validates the local dictionary columns configurations
    +   *
    +   * @param tableProperties
    +   * @param localDictColumns
    +   */
    +  def validateLocalDictionaryColumns(tableProperties: mutable.Map[String, String],
    +      localDictColumns: Seq[String]): Unit = {
    +    var dictIncludeColumns: Seq[String] = Seq[String]()
    +
    +    // check if the duplicate columns are specified in table schema
    +    if (localDictColumns.distinct.lengthCompare(localDictColumns.size) != 0) {
    +      val duplicateColumns = (dictIncludeColumns ++ localDictColumns)
    --- End diff --
   
    why adding empty Seq here?


---
123