GitHub user praveenmeenakshi56 opened a pull request:
https://github.com/apache/carbondata/pull/2590 [CARBONDATA-2750] Updated documentation on Local Dictionary Supoort Updated Documentation on Local Dictionary Support. Changed default scenario for Local dictionary to false. - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? Document Updated - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. NA - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/praveenmeenakshi56/carbondata local_doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2590.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2590 ---- commit 540066df96d96f8a4f980441e9b6666b3636aa31 Author: praveenmeenakshi56 <praveenmeenakshi56@...> Date: 2018-07-31T10:19:10Z Updated documentation on Local Dictionary Supoort ---- --- |
Github user chenliang613 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2590#discussion_r206484679 --- Diff: docs/data-management-on-carbondata.md --- @@ -126,20 +126,20 @@ This tutorial is going to introduce all commands and data operations on CarbonDa - **Local Dictionary Configuration** - Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in: --- End diff -- Please add one Note and list which data type don't support. --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2590#discussion_r206485268 --- Diff: docs/data-management-on-carbondata.md --- @@ -126,20 +126,20 @@ This tutorial is going to introduce all commands and data operations on CarbonDa - **Local Dictionary Configuration** - Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in: 1. Getting more compression on dimension columns with less cardinality. 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. - By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. + By default, Local Dictionary will be disabled. Users will be able to pass following properties in create table command: | Properties | Default value | Description | | ---------- | ------------- | ----------- | - | LOCAL_DICTIONARY_ENABLE | true | By default, local dictionary will be enabled for the table | - | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (range- 1000 to 100000) | - | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for which Local Dictionary is generated. | + | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will be disabled for the table | + | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (maximum - 100000) | + | LOCAL_DICTIONARY_INCLUDE | all string/varchar columns which are not included in dictionary include| Columns for which Local Dictionary is generated. | --- End diff -- "which are not included in dictionary include" -- please refine. --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2590#discussion_r206485782 --- Diff: docs/data-management-on-carbondata.md --- @@ -126,20 +126,20 @@ This tutorial is going to introduce all commands and data operations on CarbonDa - **Local Dictionary Configuration** - Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in: 1. Getting more compression on dimension columns with less cardinality. 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. --- End diff -- Please explain : what is the cost for enabling local dictionary. --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2590#discussion_r206486006 --- Diff: docs/data-management-on-carbondata.md --- @@ -508,6 +511,9 @@ Users can specify which columns to include and exclude for local dictionary gene ``` ALTER TABLE tablename UNSET TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE','LOCAL_DICTIONARY_THRESHOLD','LOCAL_DICTIONARY_INCLUDE','LOCAL_DICTIONARY_EXCLUDE') ``` + + **NOTE:** For old tables, by default, local dictionary is disabled. If user wants local dictionary, he/she can enable/disable local dictionary for new data on those tables at his/her discretion. --- End diff -- "he/she" change to "user" --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2590 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6078/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2590 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6396/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2590 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7675/ --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2590#discussion_r206553068 --- Diff: docs/data-management-on-carbondata.md --- @@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and data operations on CarbonDa - **Local Dictionary Configuration** - Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in: 1. Getting more compression on dimension columns with less cardinality. 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. - - By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. + + **Bottleneck for Local Dictionary:** The memory size will increase when local dictionary is enabled. --- End diff -- Please change "bottleneck" to "The cost" --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2590 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7683/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2590 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6407/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2590 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6083/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2590 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7708/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2590 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6434/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2590 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6100/ --- |
In reply to this post by qiuchenjian-2
Github user sraghunandan commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2590#discussion_r207069841 --- Diff: docs/data-management-on-carbondata.md --- @@ -126,20 +126,20 @@ This tutorial is going to introduce all commands and data operations on CarbonDa - **Local Dictionary Configuration** - Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in: --- End diff -- Add a small sentence on what local dictionary means --- |
In reply to this post by qiuchenjian-2
Github user sraghunandan commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2590#discussion_r207070762 --- Diff: docs/data-management-on-carbondata.md --- @@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and data operations on CarbonDa - **Local Dictionary Configuration** - Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in: 1. Getting more compression on dimension columns with less cardinality. 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. - - By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. + + **The cost for Local Dictionary:** The memory size will increase when local dictionary is enabled. + + **NOTE:** Following Data Types are not Supported for Local Dictionary: + * SMALLINT + * INTEGER + * BIGINT + * DOUBLE + * DECIMAL + * TIMESTAMP + * DATE + * CHAR + * BOOLEAN + + By default, Local Dictionary will be disabled. Users will be able to pass following properties in create table command: | Properties | Default value | Description | | ---------- | ------------- | ----------- | - | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will not be enabled for the table | - | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (range- 1000 to 100000) | - | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for which Local Dictionary is generated. | + | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will be disabled for the table | + | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (maximum - 100000) | --- End diff -- description not correct. need to explain what threshold means and what happens beyond threshold --- |
In reply to this post by qiuchenjian-2
Github user sraghunandan commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2590#discussion_r207069983 --- Diff: docs/data-management-on-carbondata.md --- @@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and data operations on CarbonDa - **Local Dictionary Configuration** - Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in: 1. Getting more compression on dimension columns with less cardinality. 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. --- End diff -- remove No-Dictionary --- |
In reply to this post by qiuchenjian-2
Github user sraghunandan commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2590#discussion_r207070525 --- Diff: docs/data-management-on-carbondata.md --- @@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and data operations on CarbonDa - **Local Dictionary Configuration** - Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in: 1. Getting more compression on dimension columns with less cardinality. 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. - - By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. + + **The cost for Local Dictionary:** The memory size will increase when local dictionary is enabled. --- End diff -- can add a sentence as to why it will increase --- |
In reply to this post by qiuchenjian-2
Github user sraghunandan commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2590#discussion_r207070939 --- Diff: docs/data-management-on-carbondata.md --- @@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and data operations on CarbonDa - **Local Dictionary Configuration** - Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in: 1. Getting more compression on dimension columns with less cardinality. 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. - - By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. + + **The cost for Local Dictionary:** The memory size will increase when local dictionary is enabled. + + **NOTE:** Following Data Types are not Supported for Local Dictionary: + * SMALLINT + * INTEGER + * BIGINT + * DOUBLE + * DECIMAL + * TIMESTAMP + * DATE + * CHAR + * BOOLEAN + + By default, Local Dictionary will be disabled. Users will be able to pass following properties in create table command: | Properties | Default value | Description | | ---------- | ------------- | ----------- | - | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will not be enabled for the table | - | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (range- 1000 to 100000) | - | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for which Local Dictionary is generated. | + | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will be disabled for the table | + | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (maximum - 100000) | + | LOCAL_DICTIONARY_INCLUDE | all string/varchar columns not specified in dictionary include| Columns for which Local Dictionary is generated. | --- End diff -- if i don't specify this property, what is the behaviour? --- |
Free forum by Nabble | Edit this page |