Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2520#discussion_r204969187 --- Diff: docs/data-management-on-carbondata.md --- @@ -124,6 +124,41 @@ This tutorial is going to introduce all commands and data operations on CarbonDa TBLPROPERTIES ('streaming'='true') ``` + - **Local Dictionary Configuration** + + Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + 1. Getting more compression on dimension columns with less cardinality. + 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. + 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. + + By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. + + Users will be able to pass following properties in create table command: + + | Properties | Default value | Description | + | ---------- | ------------- | ----------- | + | LOCAL_DICTIONARY_ENABLE | true | By default, local dictionary will be enabled for the table | + | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (range- 1000 to 100000) | + | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for which Local Dictionary is generated. | + | LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary is not generated | + + +### Example: + + ``` + CREATE TABLE carbontable( + + column1 string, + + column2 string, + + column3 LONG ) + + STORED BY 'carbondata' + TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE'='true',âLOCAL_DICTIONARY_THRESHOLD'='1000', --- End diff -- `â` before `LOCAL_DICTIONARY_THRESHOLD` is wrong. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2520#discussion_r204968918 --- Diff: docs/data-management-on-carbondata.md --- @@ -124,6 +124,41 @@ This tutorial is going to introduce all commands and data operations on CarbonDa TBLPROPERTIES ('streaming'='true') ``` + - **Local Dictionary Configuration** + + Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + 1. Getting more compression on dimension columns with less cardinality. + 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. + 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. + + By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. --- End diff -- The indent of â By defaultâ is wrong --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2520#discussion_r204969612 --- Diff: docs/data-management-on-carbondata.md --- @@ -333,6 +373,20 @@ This tutorial is going to introduce all commands and data operations on CarbonDa ``` ALTER TABLE test_db.carbon CHANGE a1 a1 DECIMAL(18,2) ``` + - **SET and UNSET for Local Dictionary Properties** + + When set command is used, all the newly set properties will override the corresponding old properties if exists. + + Example to SET Local Dictionary Properties: + ``` + ALTER TABLE tablename SET TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE'='false',âLOCAL_DICTIONARY_THRESHOLD'='1000','LOCAL_DICTIONARY_INCLUDE'='column1','LOCAL_DICTIONARY_EXCLUDE'='column2') + ``` + When Local Dictionary Properties are unset, default of Local Dictionary Enable will be changed to true, default of Local Dictionary Threshold will be changed to 10000, and columns for Local Dictionary Include by default will be all no-dictionary String/Varchar datatype columns. --- End diff -- no need to repeat the default scenario, better to change it to `When Local Dictionary properties are unset, corresponding default value will be used for those properties.` --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2520#discussion_r204969813 --- Diff: docs/data-management-on-carbondata.md --- @@ -124,6 +124,41 @@ This tutorial is going to introduce all commands and data operations on CarbonDa TBLPROPERTIES ('streaming'='true') ``` + - **Local Dictionary Configuration** + + Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + 1. Getting more compression on dimension columns with less cardinality. + 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. + 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. + + By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. + + Users will be able to pass following properties in create table command: + + | Properties | Default value | Description | + | ---------- | ------------- | ----------- | + | LOCAL_DICTIONARY_ENABLE | true | By default, local dictionary will be enabled for the table | + | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (range- 1000 to 100000) | + | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for which Local Dictionary is generated. | + | LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary is not generated | + --- End diff -- What about the limitations? Such as, can local dictionary columns work with: 1. sort_columns? 2. dictionary include? 3. complex? 3. etc. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2520#discussion_r204969011 --- Diff: docs/data-management-on-carbondata.md --- @@ -122,6 +122,45 @@ This tutorial is going to introduce all commands and data operations on CarbonDa TBLPROPERTIES ('streaming'='true') ``` + - **Local Dictionary Configuration** + + Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + 1. Getting more compression on dimension columns with less cardinality. + 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. + 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. + + By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. --- End diff -- +1 --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2520#discussion_r204970343 --- Diff: docs/data-management-on-carbondata.md --- @@ -124,6 +124,41 @@ This tutorial is going to introduce all commands and data operations on CarbonDa TBLPROPERTIES ('streaming'='true') ``` + - **Local Dictionary Configuration** + + Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + 1. Getting more compression on dimension columns with less cardinality. + 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. + 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. + + By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. + + Users will be able to pass following properties in create table command: + + | Properties | Default value | Description | + | ---------- | ------------- | ----------- | + | LOCAL_DICTIONARY_ENABLE | true | By default, local dictionary will be enabled for the table | + | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (range- 1000 to 100000) | --- End diff -- add more description for it, such as `If the cardinality exceeds the threshold, this column will will not use local dictionary encoding. And in this case, the data loading performance will decrease since there is a rollback procedure for local dictionary encoding.` --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2520#discussion_r204970695 --- Diff: docs/data-management-on-carbondata.md --- @@ -124,6 +124,41 @@ This tutorial is going to introduce all commands and data operations on CarbonDa TBLPROPERTIES ('streaming'='true') ``` + - **Local Dictionary Configuration** + + Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + 1. Getting more compression on dimension columns with less cardinality. + 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. + 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. + + By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. + + Users will be able to pass following properties in create table command: + + | Properties | Default value | Description | + | ---------- | ------------- | ----------- | + | LOCAL_DICTIONARY_ENABLE | true | By default, local dictionary will be enabled for the table | + | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (range- 1000 to 100000) | --- End diff -- What is the scope of `cardinality` here? Page/Blocklet/Block/Segment/Table level? Previously for global cardinality, it is table level. --- |
In reply to this post by qiuchenjian-2
Github user ndwangsen commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2520#discussion_r204973033 --- Diff: docs/data-management-on-carbondata.md --- @@ -122,6 +122,45 @@ This tutorial is going to introduce all commands and data operations on CarbonDa TBLPROPERTIES ('streaming'='true') ``` + - **Local Dictionary Configuration** + + Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + 1. Getting more compression on dimension columns with less cardinality. + 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. + 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. + + By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. --- End diff -- By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. ----- Is the data loading performance ok? --- |
In reply to this post by qiuchenjian-2
Github user praveenmeenakshi56 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2520#discussion_r205094917 --- Diff: docs/data-management-on-carbondata.md --- @@ -122,6 +122,45 @@ This tutorial is going to introduce all commands and data operations on CarbonDa TBLPROPERTIES ('streaming'='true') ``` + - **Local Dictionary Configuration** + + Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + 1. Getting more compression on dimension columns with less cardinality. + 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. + 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. + + By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. --- End diff -- Data Loading Performance is affected only by 8%. Test with 3.5 billion records (103 columns) --- |
In reply to this post by qiuchenjian-2
Github user praveenmeenakshi56 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2520#discussion_r205102030 --- Diff: docs/data-management-on-carbondata.md --- @@ -124,6 +124,41 @@ This tutorial is going to introduce all commands and data operations on CarbonDa TBLPROPERTIES ('streaming'='true') ``` + - **Local Dictionary Configuration** + + Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + 1. Getting more compression on dimension columns with less cardinality. + 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. + 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. + + By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. + + Users will be able to pass following properties in create table command: + + | Properties | Default value | Description | + | ---------- | ------------- | ----------- | + | LOCAL_DICTIONARY_ENABLE | true | By default, local dictionary will be enabled for the table | + | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (range- 1000 to 100000) | + | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for which Local Dictionary is generated. | + | LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary is not generated | + --- End diff -- All the aforesaid are supported with Local Dictionary. The additional information is already present in the Design Document in the JIRA. Please refer the same. --- |
In reply to this post by qiuchenjian-2
Github user praveenmeenakshi56 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2520#discussion_r205108171 --- Diff: docs/data-management-on-carbondata.md --- @@ -124,6 +124,41 @@ This tutorial is going to introduce all commands and data operations on CarbonDa TBLPROPERTIES ('streaming'='true') ``` + - **Local Dictionary Configuration** + + Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + 1. Getting more compression on dimension columns with less cardinality. + 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. + 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. + + By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. + + Users will be able to pass following properties in create table command: + + | Properties | Default value | Description | + | ---------- | ------------- | ----------- | + | LOCAL_DICTIONARY_ENABLE | true | By default, local dictionary will be enabled for the table | + | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (range- 1000 to 100000) | --- End diff -- It is Segment/Task Level. Please refer to JIRA 2584. --- |
In reply to this post by qiuchenjian-2
Github user praveenmeenakshi56 closed the pull request at:
https://github.com/apache/carbondata/pull/2520 --- |
In reply to this post by qiuchenjian-2
GitHub user praveenmeenakshi56 reopened a pull request:
https://github.com/apache/carbondata/pull/2520 [CARBONDATA-2750] Added Documentation for Local Dictionary Support ### What has been added? Documentation for Local Dictionary Support has been added. - [x] Any interfaces changed? NA - [x] Any backward compatibility impacted? NA - [x] Document update required? Document has been added in this PR. - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. NA - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/praveenmeenakshi56/carbondata local_dict_doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2520.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2520 ---- commit 0e45c06137eac49508de1844bfc31321ba29acf2 Author: praveenmeenakshi56 <praveenmeenakshi56@...> Date: 2018-07-18T06:07:29Z Added Documentation for Local Dictionary Support Conflicts: docs/data-management-on-carbondata.md commit 9093c09463758aafca590ee4fd476a679902fe94 Author: praveenmeenakshi56 <praveenmeenakshi56@...> Date: 2018-07-25T15:08:05Z Added Documentation for Local Dictionary Support ---- --- |
In reply to this post by qiuchenjian-2
Github user sgururajshetty commented on the issue:
https://github.com/apache/carbondata/pull/2520 LGTM --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2520 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7512/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2520 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6266/ --- |
In reply to this post by qiuchenjian-2
|
Free forum by Nabble | Edit this page |