GitHub user akashrn5 opened a pull request:
https://github.com/apache/carbondata/pull/2375 [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]Support adding local dictionary configuration in create table statement and show the configs in describe formatted table ## What changes were proposed in this pull request? In this PR, in order to support local dictionary, create table changes are made to support local dictionary configurations as table properties and showing those properties in describe formatted command based on whether the local dictionary enabled or disabled. Highlights: basically we will have four properties 1. LOCAL_DICT_ENABLE => whether to enable or disable local dictionary 2. LOCAL_DICT_THRESHOLD => threshold property for the column to generate local dictionary 3. LOCAL_DICT_INCLUDE => columns for which local dictionary needs to be generated 4. LOCAL_DICT_EXCLUDE => columns for which local dictionary should not be generated ## How was this patch tested? Manual testing, and UTs are added in another PR. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? NA - [x] Any backward compatibility impacted? NA - [x] Document update required? YES, it will done in another PR - [x] Testing done manual testing is done, UTs will be added in another PR Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/akashrn5/incubator-carbondata localdict_prop_desc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2375.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2375 ---- commit 5308e1fd3602f4a24d78694b416fda7b6f3a9bcd Author: akashrn5 <akashnilugal@...> Date: 2018-06-06T15:03:39Z Support adding local dictionary configuration in create table statement and show the configs in describe formatted table ---- --- |
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2375 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5292/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2375 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6342/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2375 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5180/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2375 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5298/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2375 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6348/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2375 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5186/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2375 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5299/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2375 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6356/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2375 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5194/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2375 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5306/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2375#discussion_r196304734 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -883,6 +883,37 @@ public static final String COLUMN_GROUPS = "column_groups"; public static final String DICTIONARY_EXCLUDE = "dictionary_exclude"; public static final String DICTIONARY_INCLUDE = "dictionary_include"; + + /** + * Table property to enable or disable local dictionary generation + */ + public static final String LOCAL_DICT_ENABLE = "local_dict_enable"; --- End diff -- If this properties exposed to the user, then give the proper name `local_dictionary_enable` --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2375#discussion_r196304856 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -883,6 +883,37 @@ public static final String COLUMN_GROUPS = "column_groups"; public static final String DICTIONARY_EXCLUDE = "dictionary_exclude"; public static final String DICTIONARY_INCLUDE = "dictionary_include"; + + /** + * Table property to enable or disable local dictionary generation + */ + public static final String LOCAL_DICT_ENABLE = "local_dict_enable"; + + /** + * default value for local dictionary generation + */ + public static final String LOCAL_DICT_ENABLE_DEFAULT = "true"; + + /** + * Threshold value for local dictionary + */ + public static final String LOCAL_DICT_THRESHOLD = "local_dict_threshold"; --- End diff -- same as above comment, provide full meaningful name. do for other properties as well. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2375#discussion_r196305311 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java --- @@ -150,6 +150,16 @@ private boolean hasDataMapSchema; + /** + * is local dictionary generation enabled for the table + */ + private boolean isLocalDictionaryEnabled; --- End diff -- Don't add extra properties, just get from table properties --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2375#discussion_r196305365 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java --- @@ -150,6 +150,16 @@ private boolean hasDataMapSchema; + /** + * is local dictionary generation enabled for the table + */ + private boolean isLocalDictionaryEnabled; + + /** + * local dictionary generation threshold + */ + private int localDictionaryThreshold; --- End diff -- Don't add extra fields, just get from table properties --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2375#discussion_r196305431 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java --- @@ -1032,5 +1073,32 @@ public static void updateTableByTableInfo(CarbonTable table, TableInfo tableInfo } table.hasDataMapSchema = null != tableInfo.getDataMapSchemaList() && tableInfo.getDataMapSchemaList().size() > 0; + setLocalDictInfo(table, tableInfo); + } + + /** + * This method sets whether the local dictionary is enabled or not, and the local dictionary + * threshold, if not defined default value are considered. + * @param table + * @param tableInfo + */ + private static void setLocalDictInfo(CarbonTable table, TableInfo tableInfo) { --- End diff -- I guess this method not required --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2375#discussion_r196305534 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -17,12 +17,31 @@ package org.apache.carbondata.core.util; -import java.io.*; +import java.io.BufferedReader; +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.Closeable; +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.io.ObjectInputStream; +import java.io.UnsupportedEncodingException; import java.math.BigDecimal; import java.nio.ByteBuffer; import java.nio.charset.Charset; import java.security.PrivilegedExceptionAction; -import java.util.*; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Locale; +import java.util.Map; +import java.util.Set; +import java.util.UUID; --- End diff -- use `import java.util.*;` --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2375#discussion_r196305555 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -17,12 +17,31 @@ package org.apache.carbondata.core.util; -import java.io.*; +import java.io.BufferedReader; +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.Closeable; +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.io.ObjectInputStream; +import java.io.UnsupportedEncodingException; --- End diff -- import java.io.*; --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2375#discussion_r196305827 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala --- @@ -550,6 +550,11 @@ class TableNewProcessor(cm: TableModel) { } } + // check whether the column is a local dictionary column and set in column schema + if (null != cm.tableProperties.asJava) { --- End diff -- Don't need to convert to java for null check --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2375#discussion_r196307015 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -3000,5 +3019,95 @@ public static String getBlockId(AbsoluteTableIdentifier identifier, String fileP } return blockId; } + + /** + * sets the local dictionary columns to wrapper schema, if the table property local_dict_include + * is defined, then those columns will be set as local dictionary columns, if not, all the no + * dictionary string datatype columns are set as local dictionary columns. + * Handling for complexTypes:: + * Since the column structure will be flat + * if the parent column is configured as local Dictionary column, then it gets the child column + * count and then sets the primitive child column as local dictionary column if it is string + * datatype column + * Handling for both localDictionary Include and exclude columns: + * There will be basically four scenarios which are + * ------------------------------------------------------- + * | Local_Dictionary_include | Local_Dictionary_Exclude | + * ------------------------------------------------------- + * | Not Defined | Not Defined | + * | Not Defined | Defined | + * | Defined | Not Defined | + * | Defined | Defined | + * ------------------------------------------------------- + * 1. when the both local dictionary include and exclude is not defined, then set all the no + * dictionary string datatype columns as local dictionary generate columns + * 2. set all the no dictionary string datatype columns as local dictionary columns except the + * columns present in local dictionary exclude + * 3. & 4. when local dictionary include is defined, no need to check dictionary exclude columns + * configured or not, we just need to set only the columns present in local dictionary include as + * local dictionary columns + * @param columns + * @param mainTableProperties + */ + public static void setLocalDictColumnsToWrapperSchema(List<ColumnSchema> columns, + Map<String, String> mainTableProperties) { + String isLocalDictEnabledForMainTable = + mainTableProperties.get(CarbonCommonConstants.LOCAL_DICT_ENABLE); + String localDictIncludeColumnsOfMainTable = + mainTableProperties.get(CarbonCommonConstants.LOCAL_DICT_INCLUDE); + String localDictExcludeColumnsOfMainTable = + mainTableProperties.get(CarbonCommonConstants.LOCAL_DICT_EXCLUDE); + if (null != isLocalDictEnabledForMainTable && Boolean + .parseBoolean(isLocalDictEnabledForMainTable)) { + int childColumnCount = 0; + for (ColumnSchema column : columns) { + // for complex type columns, user gives the parent column as local dictionary column and + // only the string primitive type child column will be set as local dictionary column in the + // schema + if (childColumnCount > 0) { + if (column.getDataType().equals(DataTypes.STRING)) { + column.setLocalDictColumn(true); + childColumnCount -= 1; + } else { + childColumnCount -= 1; + } + } + // if complex column is defined in local dictionary include column, then get the child + // columns and set the string datatype child type as local dictionary column + if (column.getNumberOfChild() > 0 && null != localDictIncludeColumnsOfMainTable + && localDictIncludeColumnsOfMainTable.toLowerCase() + .contains(column.getColumnName().toLowerCase())) { --- End diff -- You should not check string contains as partial string also can match, do proper split and then match --- |
Free forum by Nabble | Edit this page |