[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...

classic Classic list List threaded Threaded
37 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...

qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2520#discussion_r204969187
 
    --- Diff: docs/data-management-on-carbondata.md ---
    @@ -124,6 +124,41 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
          TBLPROPERTIES ('streaming'='true')
          ```
     
    +  - **Local Dictionary Configuration**
    +  
    +  Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in:
    +  1. Getting more compression on dimension columns with less cardinality.
    +  2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data.
    +  3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data.
    +
    +       By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns.
    +      
    +       Users will be able to pass following properties in create table command:
    +      
    +       | Properties | Default value | Description |
    +       | ---------- | ------------- | ----------- |
    +       | LOCAL_DICTIONARY_ENABLE | true | By default, local dictionary will be enabled for the table |
    +       | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (range- 1000 to 100000) |
    +       | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for which Local Dictionary is generated. |
    +       | LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary is not generated |
    +        
    +
    +### Example:
    +
    +  ```
    +  CREATE TABLE carbontable(
    +            
    +              column1 string,
    +            
    +              column2 string,
    +            
    +              column3 LONG )
    +            
    +    STORED BY 'carbondata'
    +    TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE'='true',’LOCAL_DICTIONARY_THRESHOLD'='1000',
    --- End diff --
   
    `’` before `LOCAL_DICTIONARY_THRESHOLD` is wrong.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2520#discussion_r204968918
 
    --- Diff: docs/data-management-on-carbondata.md ---
    @@ -124,6 +124,41 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
          TBLPROPERTIES ('streaming'='true')
          ```
     
    +  - **Local Dictionary Configuration**
    +  
    +  Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in:
    +  1. Getting more compression on dimension columns with less cardinality.
    +  2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data.
    +  3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data.
    +
    +       By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns.
    --- End diff --
   
    The indent of ‘       By default’ is wrong


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2520#discussion_r204969612
 
    --- Diff: docs/data-management-on-carbondata.md ---
    @@ -333,6 +373,20 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
          ```
          ALTER TABLE test_db.carbon CHANGE a1 a1 DECIMAL(18,2)
          ```
    +   - **SET and UNSET for Local Dictionary Properties**
    +  
    +      When set command is used, all the newly set properties will override the corresponding old properties if exists.
    +    
    +      Example to SET Local Dictionary Properties:
    +       ```
    +      ALTER TABLE tablename SET TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE'='false',’LOCAL_DICTIONARY_THRESHOLD'='1000','LOCAL_DICTIONARY_INCLUDE'='column1','LOCAL_DICTIONARY_EXCLUDE'='column2')
    +       ```
    +      When Local Dictionary Properties are unset, default of Local Dictionary Enable will be changed to true, default of Local Dictionary Threshold will be changed to 10000, and columns for Local Dictionary Include by default will be all no-dictionary String/Varchar datatype columns.
    --- End diff --
   
    no need to repeat the default scenario, better to change it to
    `When Local Dictionary properties are unset, corresponding default value will be used for those properties.`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2520#discussion_r204969813
 
    --- Diff: docs/data-management-on-carbondata.md ---
    @@ -124,6 +124,41 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
          TBLPROPERTIES ('streaming'='true')
          ```
     
    +  - **Local Dictionary Configuration**
    +  
    +  Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in:
    +  1. Getting more compression on dimension columns with less cardinality.
    +  2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data.
    +  3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data.
    +
    +       By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns.
    +      
    +       Users will be able to pass following properties in create table command:
    +      
    +       | Properties | Default value | Description |
    +       | ---------- | ------------- | ----------- |
    +       | LOCAL_DICTIONARY_ENABLE | true | By default, local dictionary will be enabled for the table |
    +       | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (range- 1000 to 100000) |
    +       | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for which Local Dictionary is generated. |
    +       | LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary is not generated |
    +        
    --- End diff --
   
    What about the limitations? Such as, can local dictionary columns work with:
    1. sort_columns?
    2. dictionary include?
    3. complex?
    3. etc.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2520#discussion_r204969011
 
    --- Diff: docs/data-management-on-carbondata.md ---
    @@ -122,6 +122,45 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
          TBLPROPERTIES ('streaming'='true')
          ```
     
    +  - **Local Dictionary Configuration**
    +  
    +  Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in:
    +  1. Getting more compression on dimension columns with less cardinality.
    +  2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data.
    +  3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data.
    +
    +       By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns.
    --- End diff --
   
    +1


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2520#discussion_r204970343
 
    --- Diff: docs/data-management-on-carbondata.md ---
    @@ -124,6 +124,41 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
          TBLPROPERTIES ('streaming'='true')
          ```
     
    +  - **Local Dictionary Configuration**
    +  
    +  Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in:
    +  1. Getting more compression on dimension columns with less cardinality.
    +  2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data.
    +  3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data.
    +
    +       By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns.
    +      
    +       Users will be able to pass following properties in create table command:
    +      
    +       | Properties | Default value | Description |
    +       | ---------- | ------------- | ----------- |
    +       | LOCAL_DICTIONARY_ENABLE | true | By default, local dictionary will be enabled for the table |
    +       | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (range- 1000 to 100000) |
    --- End diff --
   
    add more description for it, such as `If the cardinality exceeds the threshold, this column will will not use local dictionary encoding. And in this case, the data loading performance will decrease since there is a rollback procedure for local dictionary encoding.`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2520#discussion_r204970695
 
    --- Diff: docs/data-management-on-carbondata.md ---
    @@ -124,6 +124,41 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
          TBLPROPERTIES ('streaming'='true')
          ```
     
    +  - **Local Dictionary Configuration**
    +  
    +  Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in:
    +  1. Getting more compression on dimension columns with less cardinality.
    +  2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data.
    +  3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data.
    +
    +       By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns.
    +      
    +       Users will be able to pass following properties in create table command:
    +      
    +       | Properties | Default value | Description |
    +       | ---------- | ------------- | ----------- |
    +       | LOCAL_DICTIONARY_ENABLE | true | By default, local dictionary will be enabled for the table |
    +       | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (range- 1000 to 100000) |
    --- End diff --
   
    What is the scope of `cardinality` here? Page/Blocklet/Block/Segment/Table level?
    Previously for global cardinality, it is table level.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ndwangsen commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2520#discussion_r204973033
 
    --- Diff: docs/data-management-on-carbondata.md ---
    @@ -122,6 +122,45 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
          TBLPROPERTIES ('streaming'='true')
          ```
     
    +  - **Local Dictionary Configuration**
    +  
    +  Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in:
    +  1. Getting more compression on dimension columns with less cardinality.
    +  2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data.
    +  3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data.
    +
    +       By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns.
    --- End diff --
   
    By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. ----- Is the data loading performance ok?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user praveenmeenakshi56 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2520#discussion_r205094917
 
    --- Diff: docs/data-management-on-carbondata.md ---
    @@ -122,6 +122,45 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
          TBLPROPERTIES ('streaming'='true')
          ```
     
    +  - **Local Dictionary Configuration**
    +  
    +  Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in:
    +  1. Getting more compression on dimension columns with less cardinality.
    +  2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data.
    +  3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data.
    +
    +       By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns.
    --- End diff --
   
    Data Loading Performance is affected only by 8%. Test with 3.5 billion records (103 columns)


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user praveenmeenakshi56 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2520#discussion_r205102030
 
    --- Diff: docs/data-management-on-carbondata.md ---
    @@ -124,6 +124,41 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
          TBLPROPERTIES ('streaming'='true')
          ```
     
    +  - **Local Dictionary Configuration**
    +  
    +  Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in:
    +  1. Getting more compression on dimension columns with less cardinality.
    +  2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data.
    +  3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data.
    +
    +       By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns.
    +      
    +       Users will be able to pass following properties in create table command:
    +      
    +       | Properties | Default value | Description |
    +       | ---------- | ------------- | ----------- |
    +       | LOCAL_DICTIONARY_ENABLE | true | By default, local dictionary will be enabled for the table |
    +       | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (range- 1000 to 100000) |
    +       | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for which Local Dictionary is generated. |
    +       | LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary is not generated |
    +        
    --- End diff --
   
    All the aforesaid are supported with Local Dictionary. The additional information is already present in the Design Document in the JIRA. Please refer the same.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user praveenmeenakshi56 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2520#discussion_r205108171
 
    --- Diff: docs/data-management-on-carbondata.md ---
    @@ -124,6 +124,41 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
          TBLPROPERTIES ('streaming'='true')
          ```
     
    +  - **Local Dictionary Configuration**
    +  
    +  Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in:
    +  1. Getting more compression on dimension columns with less cardinality.
    +  2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data.
    +  3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data.
    +
    +       By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns.
    +      
    +       Users will be able to pass following properties in create table command:
    +      
    +       | Properties | Default value | Description |
    +       | ---------- | ------------- | ----------- |
    +       | LOCAL_DICTIONARY_ENABLE | true | By default, local dictionary will be enabled for the table |
    +       | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (range- 1000 to 100000) |
    --- End diff --
   
    It is Segment/Task Level. Please refer to JIRA 2584.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user praveenmeenakshi56 closed the pull request at:

    https://github.com/apache/carbondata/pull/2520


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...

qiuchenjian-2
In reply to this post by qiuchenjian-2
GitHub user praveenmeenakshi56 reopened a pull request:

    https://github.com/apache/carbondata/pull/2520

    [CARBONDATA-2750] Added Documentation for Local Dictionary Support

    ### What has been added?
    Documentation for Local Dictionary Support has been added.
     - [x] Any interfaces changed?
     NA
     - [x] Any backward compatibility impacted?
     NA
     - [x] Document update required?
    Document has been added in this PR.
     - [x] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
     NA    
     - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    NA


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/praveenmeenakshi56/carbondata local_dict_doc

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2520.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2520
   
----
commit 0e45c06137eac49508de1844bfc31321ba29acf2
Author: praveenmeenakshi56 <praveenmeenakshi56@...>
Date:   2018-07-18T06:07:29Z

    Added Documentation for Local Dictionary Support
   
    Conflicts:
    docs/data-management-on-carbondata.md

commit 9093c09463758aafca590ee4fd476a679902fe94
Author: praveenmeenakshi56 <praveenmeenakshi56@...>
Date:   2018-07-25T15:08:05Z

    Added Documentation for Local Dictionary Support

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2520: [CARBONDATA-2750] Added Documentation for Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user sgururajshetty commented on the issue:

    https://github.com/apache/carbondata/pull/2520
 
    LGTM



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2520: [CARBONDATA-2750] Added Documentation for Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2520
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7512/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2520: [CARBONDATA-2750] Added Documentation for Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2520
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6266/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2520


---
12