Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

Classic

List

54 messages Options

Options

123

[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2590#discussion_r207071343

--- Diff: docs/data-management-on-carbondata.md ---
@@ -524,6 +540,9 @@ Users can specify which columns to include and exclude for local dictionary gene
```
ALTER TABLE tablename UNSET TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE','LOCAL_DICTIONARY_THRESHOLD','LOCAL_DICTIONARY_INCLUDE','LOCAL_DICTIONARY_EXCLUDE')
```
+
+ **NOTE:** For old tables, by default, local dictionary is disabled. If user wants local dictionary, user can enable/disable local dictionary for new data on those tables at their discretion.
--- End diff --

local dictionary is disabled for new tables also.Need to mention it can be enabled for old tables also

---

[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

In reply to this post by qiuchenjian-2

Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2590#discussion_r207070998

--- Diff: docs/data-management-on-carbondata.md ---
@@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and data operations on CarbonDa

- **Local Dictionary Configuration**

- Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in:
+ Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in:
1. Getting more compression on dimension columns with less cardinality.
2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data.
3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data.
-
- By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns.
+
+ **The cost for Local Dictionary:** The memory size will increase when local dictionary is enabled.
+
+ **NOTE:** Following Data Types are not Supported for Local Dictionary:
+ * SMALLINT
+ * INTEGER
+ * BIGINT
+ * DOUBLE
+ * DECIMAL
+ * TIMESTAMP
+ * DATE
+ * CHAR
+ * BOOLEAN
+
+ By default, Local Dictionary will be disabled.

Users will be able to pass following properties in create table command:

| Properties | Default value | Description |
| ---------- | ------------- | ----------- |
- | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will not be enabled for the table |
- | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (range- 1000 to 100000) |
- | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for which Local Dictionary is generated. |
+ | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will be disabled for the table |
+ | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (maximum - 100000) |
+ | LOCAL_DICTIONARY_INCLUDE | all string/varchar columns not specified in dictionary include| Columns for which Local Dictionary is generated. |
| LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary is not generated |

**NOTE:** If the cardinality exceeds the threshold, this column will not use local dictionary encoding. And in this case, the data loading performance will decrease since there is a rollback procedure for local dictionary encoding.
--- End diff --

fallback?

---

[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

In reply to this post by qiuchenjian-2

Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2590#discussion_r207071127

--- Diff: docs/data-management-on-carbondata.md ---
@@ -170,6 +183,9 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE'='true','LOCAL_DICTIONARY_THRESHOLD'='1000',
'LOCAL_DICTIONARY_INCLUDE'='column1','LOCAL_DICTIONARY_EXCLUDE'='column2')
```
+
--- End diff --

sentence not easy to understand. need simpler language to explain the reason

---

[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2590#discussion_r207084686

--- Diff: docs/data-management-on-carbondata.md ---
@@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and data operations on CarbonDa

- **Local Dictionary Configuration**

- Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in:
+ Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in:
1. Getting more compression on dimension columns with less cardinality.
2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data.
3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data.
-
- By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns.
+
+ **The cost for Local Dictionary:** The memory size will increase when local dictionary is enabled.
+
+ **NOTE:** Following Data Types are not Supported for Local Dictionary:
+ * SMALLINT
+ * INTEGER
+ * BIGINT
+ * DOUBLE
+ * DECIMAL
+ * TIMESTAMP
+ * DATE
+ * CHAR
--- End diff --

Why is `CHAR` not supported? As I know, SparkSQL treat both varchar and char as string, so in carbon data we actually see string.

---

[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2590#discussion_r207084952

--- Diff: docs/data-management-on-carbondata.md ---
@@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and data operations on CarbonDa

- **Local Dictionary Configuration**

- Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in:
+ Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in:
1. Getting more compression on dimension columns with less cardinality.
2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data.
3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data.
-
- By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns.
+
+ **The cost for Local Dictionary:** The memory size will increase when local dictionary is enabled.
+
+ **NOTE:** Following Data Types are not Supported for Local Dictionary:
+ * SMALLINT
+ * INTEGER
+ * BIGINT
+ * DOUBLE
+ * DECIMAL
+ * TIMESTAMP
+ * DATE
+ * CHAR
--- End diff --

what about complex? you didn't mention it

---

[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2590#discussion_r207085356

--- Diff: docs/data-management-on-carbondata.md ---
@@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and data operations on CarbonDa

- **Local Dictionary Configuration**

- Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in:
+ Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in:
1. Getting more compression on dimension columns with less cardinality.
2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data.
3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data.
-
- By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns.
+
+ **The cost for Local Dictionary:** The memory size will increase when local dictionary is enabled.
+
+ **NOTE:** Following Data Types are not Supported for Local Dictionary:
+ * SMALLINT
+ * INTEGER
+ * BIGINT
+ * DOUBLE
+ * DECIMAL
+ * TIMESTAMP
+ * DATE
+ * CHAR
+ * BOOLEAN
+
+ By default, Local Dictionary will be disabled.

Users will be able to pass following properties in create table command:

| Properties | Default value | Description |
| ---------- | ------------- | ----------- |
- | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will not be enabled for the table |
- | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (range- 1000 to 100000) |
- | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for which Local Dictionary is generated. |
+ | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will be disabled for the table |
+ | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (maximum - 100000) |
+ | LOCAL_DICTIONARY_INCLUDE | all string/varchar columns not specified in dictionary include| Columns for which Local Dictionary is generated. |
| LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary is not generated |

**NOTE:** If the cardinality exceeds the threshold, this column will not use local dictionary encoding. And in this case, the data loading performance will decrease since there is a rollback procedure for local dictionary encoding.
--- End diff --

For line 149 (162):
```
Encoded data and Actual data are both stored when Local Dictionary is enabled.
```
please change it to:
```
Encoded data with & without Local dictionary are both stored when Local Dictionary is enabled during data loading, so it requires more memory than before.
```

---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2590

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7741/

---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2590

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6466/

---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

In reply to this post by qiuchenjian-2

Github user praveenmeenakshi56 commented on the issue:

https://github.com/apache/carbondata/pull/2590

retest this please

---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2590

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7747/

---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2590

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6472/

---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2590

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7759/

---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2590

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6484/

---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2590

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6130/

---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2590

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6132/

---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2590

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6491/

---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2590

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7767/

---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

In reply to this post by qiuchenjian-2

Github user sraghunandan commented on the issue:

https://github.com/apache/carbondata/pull/2590

Lgtm

---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2590

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6143/

---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2590

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6150/

---

123