Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2805: Local dictionary Data which are not supported...

Classic

List

28 messages Options

Options

12

[GitHub] carbondata pull request #2805: Local dictionary Data which are not supported...

GitHub user sgururajshetty opened a pull request:

https://github.com/apache/carbondata/pull/2805

Local dictionary Data which are not supported float and byte updated in the note

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sgururajshetty/carbondata Data_not_supported

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2805.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2805

----
commit 335ae173c72657852539caaecce3437c70e1e07a
Author: sgururajshetty <sgururajshetty@...>
Date: 2018-10-09T11:04:20Z

Data which are not supported float and byte updated in the note

----

---

[GitHub] carbondata issue #2805: [Documentation] Local dictionary Data which are not ...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2805

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/750/

---

[GitHub] carbondata issue #2805: [Documentation] Local dictionary Data which are not ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2805

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9016/

---

[GitHub] carbondata issue #2805: [Documentation] Local dictionary Data which are not ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2805

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/948/

---

[GitHub] carbondata pull request #2805: [Documentation] Local dictionary Data which a...

In reply to this post by qiuchenjian-2

Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2805#discussion_r226924887

--- Diff: docs/ddl-of-carbondata.md ---
@@ -234,7 +234,8 @@ CarbonData DDL statements are documented here,which includes:
* TIMESTAMP
* DATE
* BOOLEAN
-
+ * FLOAT
--- End diff --

indentation is wrong, change the settings to replace tabs with spaces in your editor

---

[GitHub] carbondata pull request #2805: [Documentation] Local dictionary Data which a...

In reply to this post by qiuchenjian-2

Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2805#discussion_r226924568

--- Diff: docs/configuration-parameters.md ---
@@ -75,7 +75,7 @@ This section provides the details of all the configurations required for the Car
| carbon.use.multiple.temp.dir | false | When multiple disks are present in the system, YARN is generally configured with multiple disks to be used as temp directories for managing the containers. This configuration specifies whether to use multiple YARN local directories during data loading for disk IO load balancing.Enable ***carbon.use.local.dir*** for this configuration to take effect. **NOTE:** Data Loading is an IO intensive operation whose performance can be limited by the disk IO threshold, particularly during multi table concurrent data load.Configuring this parameter, balances the disk IO across multiple disks there by improving the over all load performance. |
| carbon.sort.temp.compressor | (none) | CarbonData writes every ***carbon.sort.size*** number of records to intermediate temp files during data loading to ensure memory footprint is within limits. These temporary files can be compressed and written in order to save the storage space. This configuration specifies the name of compressor to be used to compress the intermediate sort temp files during sort procedure in data loading. The valid values are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD' and empty. By default, empty means that Carbondata will not compress the sort temp files. **NOTE:** Compressor will be useful if you encounter disk bottleneck.Since the data needs to be compressed and decompressed,it involves additional CPU cycles,but is compensated by the high IO throughput due to less data to be written or read from the disks. |
| carbon.load.skewedDataOptimization.enabled | false | During data loading,CarbonData would divide the number of blocks equally so as to ensure all executors process same number of blocks. This mechanism satisfies most of the scenarios and ensures maximum parallel processing for optimal data loading performance.In some business scenarios, there might be scenarios where the size of blocks vary significantly and hence some executors would have to do more work if they get blocks containing more data. This configuration enables size based block allocation strategy for data loading. When loading, carbondata will use file size based block allocation strategy for task distribution. It will make sure that all the executors process the same size of data.**NOTE:** This configuration is useful if the size of your input data files varies widely, say 1MB to 1GB.For this configuration to work effectively,knowing the data pattern and size is important and necessary. |
-| carbon.load.min.size.enabled | false | During Data Loading, CarbonData would divide the number of files among the available executors to parallelize the loading operation. When the input data files are very small, this action causes to generate many small carbondata files. This configuration determines whether to enable node minumun input data size allocation strategy for data loading.It will make sure that the node load the minimum amount of data there by reducing number of carbondata files.**NOTE:** This configuration is useful if the size of the input data files are very small, like 1MB to 256MB.Refer to ***load_min_size_inmb*** to configure the minimum size to be considered for splitting files among executors. |
+| carbon.load.min.size.enabled | false | During Data Loading, CarbonData would divide the number of files among the available executors to parallelize the loading operation. When the input data files are very small, this action causes to generate many small carbondata files. This configuration determines whether to enable node minumun input data size allocation strategy for data loading. It will make sure that the nodes load the minimum amount of data there by reducing number of carbondata files.**NOTE:** This configuration is useful if the size of the input data files are very small, like 1MB to 256MB.Refer to ***load_min_size_inmb*** to configure the minimum size to be considered for splitting files among executors. |
--- End diff --

give space after full stops

---

[GitHub] carbondata pull request #2805: [Documentation] Local dictionary Data which a...

In reply to this post by qiuchenjian-2

Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2805#discussion_r226924920

--- Diff: docs/configuration-parameters.md ---
@@ -75,7 +75,7 @@ This section provides the details of all the configurations required for the Car
| carbon.use.multiple.temp.dir | false | When multiple disks are present in the system, YARN is generally configured with multiple disks to be used as temp directories for managing the containers. This configuration specifies whether to use multiple YARN local directories during data loading for disk IO load balancing.Enable ***carbon.use.local.dir*** for this configuration to take effect. **NOTE:** Data Loading is an IO intensive operation whose performance can be limited by the disk IO threshold, particularly during multi table concurrent data load.Configuring this parameter, balances the disk IO across multiple disks there by improving the over all load performance. |
| carbon.sort.temp.compressor | (none) | CarbonData writes every ***carbon.sort.size*** number of records to intermediate temp files during data loading to ensure memory footprint is within limits. These temporary files can be compressed and written in order to save the storage space. This configuration specifies the name of compressor to be used to compress the intermediate sort temp files during sort procedure in data loading. The valid values are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD' and empty. By default, empty means that Carbondata will not compress the sort temp files. **NOTE:** Compressor will be useful if you encounter disk bottleneck.Since the data needs to be compressed and decompressed,it involves additional CPU cycles,but is compensated by the high IO throughput due to less data to be written or read from the disks. |
| carbon.load.skewedDataOptimization.enabled | false | During data loading,CarbonData would divide the number of blocks equally so as to ensure all executors process same number of blocks. This mechanism satisfies most of the scenarios and ensures maximum parallel processing for optimal data loading performance.In some business scenarios, there might be scenarios where the size of blocks vary significantly and hence some executors would have to do more work if they get blocks containing more data. This configuration enables size based block allocation strategy for data loading. When loading, carbondata will use file size based block allocation strategy for task distribution. It will make sure that all the executors process the same size of data.**NOTE:** This configuration is useful if the size of your input data files varies widely, say 1MB to 1GB.For this configuration to work effectively,knowing the data pattern and size is important and necessary. |
-| carbon.load.min.size.enabled | false | During Data Loading, CarbonData would divide the number of files among the available executors to parallelize the loading operation. When the input data files are very small, this action causes to generate many small carbondata files. This configuration determines whether to enable node minumun input data size allocation strategy for data loading.It will make sure that the node load the minimum amount of data there by reducing number of carbondata files.**NOTE:** This configuration is useful if the size of the input data files are very small, like 1MB to 256MB.Refer to ***load_min_size_inmb*** to configure the minimum size to be considered for splitting files among executors. |
+| carbon.load.min.size.enabled | false | During Data Loading, CarbonData would divide the number of files among the available executors to parallelize the loading operation. When the input data files are very small, this action causes to generate many small carbondata files. This configuration determines whether to enable node minumun input data size allocation strategy for data loading. It will make sure that the nodes load the minimum amount of data there by reducing number of carbondata files.**NOTE:** This configuration is useful if the size of the input data files are very small, like 1MB to 256MB.Refer to ***load_min_size_inmb*** to configure the minimum size to be considered for splitting files among executors. |
--- End diff --

give space after full stops

---

[GitHub] carbondata pull request #2805: [Documentation] Local dictionary Data which a...

In reply to this post by qiuchenjian-2

Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2805#discussion_r226924655

--- Diff: docs/configuration-parameters.md ---
@@ -75,7 +75,7 @@ This section provides the details of all the configurations required for the Car
| carbon.use.multiple.temp.dir | false | When multiple disks are present in the system, YARN is generally configured with multiple disks to be used as temp directories for managing the containers. This configuration specifies whether to use multiple YARN local directories during data loading for disk IO load balancing.Enable ***carbon.use.local.dir*** for this configuration to take effect. **NOTE:** Data Loading is an IO intensive operation whose performance can be limited by the disk IO threshold, particularly during multi table concurrent data load.Configuring this parameter, balances the disk IO across multiple disks there by improving the over all load performance. |
| carbon.sort.temp.compressor | (none) | CarbonData writes every ***carbon.sort.size*** number of records to intermediate temp files during data loading to ensure memory footprint is within limits. These temporary files can be compressed and written in order to save the storage space. This configuration specifies the name of compressor to be used to compress the intermediate sort temp files during sort procedure in data loading. The valid values are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD' and empty. By default, empty means that Carbondata will not compress the sort temp files. **NOTE:** Compressor will be useful if you encounter disk bottleneck.Since the data needs to be compressed and decompressed,it involves additional CPU cycles,but is compensated by the high IO throughput due to less data to be written or read from the disks. |
| carbon.load.skewedDataOptimization.enabled | false | During data loading,CarbonData would divide the number of blocks equally so as to ensure all executors process same number of blocks. This mechanism satisfies most of the scenarios and ensures maximum parallel processing for optimal data loading performance.In some business scenarios, there might be scenarios where the size of blocks vary significantly and hence some executors would have to do more work if they get blocks containing more data. This configuration enables size based block allocation strategy for data loading. When loading, carbondata will use file size based block allocation strategy for task distribution. It will make sure that all the executors process the same size of data.**NOTE:** This configuration is useful if the size of your input data files varies widely, say 1MB to 1GB.For this configuration to work effectively,knowing the data pattern and size is important and necessary. |
-| carbon.load.min.size.enabled | false | During Data Loading, CarbonData would divide the number of files among the available executors to parallelize the loading operation. When the input data files are very small, this action causes to generate many small carbondata files. This configuration determines whether to enable node minumun input data size allocation strategy for data loading.It will make sure that the node load the minimum amount of data there by reducing number of carbondata files.**NOTE:** This configuration is useful if the size of the input data files are very small, like 1MB to 256MB.Refer to ***load_min_size_inmb*** to configure the minimum size to be considered for splitting files among executors. |
+| carbon.load.min.size.enabled | false | During Data Loading, CarbonData would divide the number of files among the available executors to parallelize the loading operation. When the input data files are very small, this action causes to generate many small carbondata files. This configuration determines whether to enable node minumun input data size allocation strategy for data loading. It will make sure that the nodes load the minimum amount of data there by reducing number of carbondata files.**NOTE:** This configuration is useful if the size of the input data files are very small, like 1MB to 256MB.Refer to ***load_min_size_inmb*** to configure the minimum size to be considered for splitting files among executors. |
--- End diff --

give space after full stops

---

[GitHub] carbondata pull request #2805: [Documentation] Local dictionary Data which a...

In reply to this post by qiuchenjian-2

Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2805#discussion_r226924887

--- Diff: docs/ddl-of-carbondata.md ---
@@ -234,7 +234,8 @@ CarbonData DDL statements are documented here,which includes:
* TIMESTAMP
* DATE
* BOOLEAN
-
+ * FLOAT
--- End diff --

indentation is wrong, change the settings to replace tabs with spaces in your editor

---

[GitHub] carbondata pull request #2805: [Documentation] Local dictionary Data which a...

In reply to this post by qiuchenjian-2

Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2805#discussion_r226926054

--- Diff: docs/configuration-parameters.md ---
@@ -75,7 +75,7 @@ This section provides the details of all the configurations required for the Car
| carbon.use.multiple.temp.dir | false | When multiple disks are present in the system, YARN is generally configured with multiple disks to be used as temp directories for managing the containers. This configuration specifies whether to use multiple YARN local directories during data loading for disk IO load balancing.Enable ***carbon.use.local.dir*** for this configuration to take effect. **NOTE:** Data Loading is an IO intensive operation whose performance can be limited by the disk IO threshold, particularly during multi table concurrent data load.Configuring this parameter, balances the disk IO across multiple disks there by improving the over all load performance. |
| carbon.sort.temp.compressor | (none) | CarbonData writes every ***carbon.sort.size*** number of records to intermediate temp files during data loading to ensure memory footprint is within limits. These temporary files can be compressed and written in order to save the storage space. This configuration specifies the name of compressor to be used to compress the intermediate sort temp files during sort procedure in data loading. The valid values are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD' and empty. By default, empty means that Carbondata will not compress the sort temp files. **NOTE:** Compressor will be useful if you encounter disk bottleneck.Since the data needs to be compressed and decompressed,it involves additional CPU cycles,but is compensated by the high IO throughput due to less data to be written or read from the disks. |
| carbon.load.skewedDataOptimization.enabled | false | During data loading,CarbonData would divide the number of blocks equally so as to ensure all executors process same number of blocks. This mechanism satisfies most of the scenarios and ensures maximum parallel processing for optimal data loading performance.In some business scenarios, there might be scenarios where the size of blocks vary significantly and hence some executors would have to do more work if they get blocks containing more data. This configuration enables size based block allocation strategy for data loading. When loading, carbondata will use file size based block allocation strategy for task distribution. It will make sure that all the executors process the same size of data.**NOTE:** This configuration is useful if the size of your input data files varies widely, say 1MB to 1GB.For this configuration to work effectively,knowing the data pattern and size is important and necessary. |
-| carbon.load.min.size.enabled | false | During Data Loading, CarbonData would divide the number of files among the available executors to parallelize the loading operation. When the input data files are very small, this action causes to generate many small carbondata files. This configuration determines whether to enable node minumun input data size allocation strategy for data loading.It will make sure that the node load the minimum amount of data there by reducing number of carbondata files.**NOTE:** This configuration is useful if the size of the input data files are very small, like 1MB to 256MB.Refer to ***load_min_size_inmb*** to configure the minimum size to be considered for splitting files among executors. |
+| carbon.load.min.size.enabled | false | During Data Loading, CarbonData would divide the number of files among the available executors to parallelize the loading operation. When the input data files are very small, this action causes to generate many small carbondata files. This configuration determines whether to enable node minumun input data size allocation strategy for data loading. It will make sure that the nodes load the minimum amount of data there by reducing number of carbondata files.**NOTE:** This configuration is useful if the size of the input data files are very small, like 1MB to 256MB.Refer to ***load_min_size_inmb*** to configure the minimum size to be considered for splitting files among executors. |
--- End diff --

add space after full stops

---

[GitHub] carbondata pull request #2805: [Documentation] Local dictionary Data which a...

In reply to this post by qiuchenjian-2

Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2805#discussion_r226925427

--- Diff: docs/configuration-parameters.md ---
@@ -75,7 +75,7 @@ This section provides the details of all the configurations required for the Car
| carbon.use.multiple.temp.dir | false | When multiple disks are present in the system, YARN is generally configured with multiple disks to be used as temp directories for managing the containers. This configuration specifies whether to use multiple YARN local directories during data loading for disk IO load balancing.Enable ***carbon.use.local.dir*** for this configuration to take effect. **NOTE:** Data Loading is an IO intensive operation whose performance can be limited by the disk IO threshold, particularly during multi table concurrent data load.Configuring this parameter, balances the disk IO across multiple disks there by improving the over all load performance. |
| carbon.sort.temp.compressor | (none) | CarbonData writes every ***carbon.sort.size*** number of records to intermediate temp files during data loading to ensure memory footprint is within limits. These temporary files can be compressed and written in order to save the storage space. This configuration specifies the name of compressor to be used to compress the intermediate sort temp files during sort procedure in data loading. The valid values are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD' and empty. By default, empty means that Carbondata will not compress the sort temp files. **NOTE:** Compressor will be useful if you encounter disk bottleneck.Since the data needs to be compressed and decompressed,it involves additional CPU cycles,but is compensated by the high IO throughput due to less data to be written or read from the disks. |
| carbon.load.skewedDataOptimization.enabled | false | During data loading,CarbonData would divide the number of blocks equally so as to ensure all executors process same number of blocks. This mechanism satisfies most of the scenarios and ensures maximum parallel processing for optimal data loading performance.In some business scenarios, there might be scenarios where the size of blocks vary significantly and hence some executors would have to do more work if they get blocks containing more data. This configuration enables size based block allocation strategy for data loading. When loading, carbondata will use file size based block allocation strategy for task distribution. It will make sure that all the executors process the same size of data.**NOTE:** This configuration is useful if the size of your input data files varies widely, say 1MB to 1GB.For this configuration to work effectively,knowing the data pattern and size is important and necessary. |
-| carbon.load.min.size.enabled | false | During Data Loading, CarbonData would divide the number of files among the available executors to parallelize the loading operation. When the input data files are very small, this action causes to generate many small carbondata files. This configuration determines whether to enable node minumun input data size allocation strategy for data loading.It will make sure that the node load the minimum amount of data there by reducing number of carbondata files.**NOTE:** This configuration is useful if the size of the input data files are very small, like 1MB to 256MB.Refer to ***load_min_size_inmb*** to configure the minimum size to be considered for splitting files among executors. |
+| carbon.load.min.size.enabled | false | During Data Loading, CarbonData would divide the number of files among the available executors to parallelize the loading operation. When the input data files are very small, this action causes to generate many small carbondata files. This configuration determines whether to enable node minumun input data size allocation strategy for data loading. It will make sure that the nodes load the minimum amount of data there by reducing number of carbondata files.**NOTE:** This configuration is useful if the size of the input data files are very small, like 1MB to 256MB.Refer to ***load_min_size_inmb*** to configure the minimum size to be considered for splitting files among executors. |
--- End diff --

add a space after full stops.

---

[GitHub] carbondata pull request #2805: [Documentation] Local dictionary Data which a...

In reply to this post by qiuchenjian-2

Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2805#discussion_r226926188

--- Diff: docs/ddl-of-carbondata.md ---
@@ -234,7 +234,8 @@ CarbonData DDL statements are documented here,which includes:
* TIMESTAMP
* DATE
* BOOLEAN
-
+ * FLOAT
--- End diff --

correct the indentation. Add settings to change tab to spaces in your editor

---

[GitHub] carbondata issue #2805: [Documentation] Local dictionary Data which are not ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2805

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1049/

---

[GitHub] carbondata issue #2805: [Documentation] Local dictionary Data which are not ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2805

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1050/

---

[GitHub] carbondata issue #2805: [Documentation] Local dictionary Data which are not ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2805

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1051/

---

[GitHub] carbondata issue #2805: [Documentation] Local dictionary Data which are not ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2805

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1265/

---

[GitHub] carbondata issue #2805: [Documentation] Local dictionary Data which are not ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2805

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9317/

---

[GitHub] carbondata issue #2805: [Documentation] Local dictionary Data which are not ...

In reply to this post by qiuchenjian-2

Github user sgururajshetty commented on the issue:

https://github.com/apache/carbondata/pull/2805

@sraghunandan kindly review and help me to merge my changes

---

[GitHub] carbondata issue #2805: [Documentation] Local dictionary Data which are not ...

In reply to this post by qiuchenjian-2

Github user sgururajshetty commented on the issue:

https://github.com/apache/carbondata/pull/2805

@sraghunandan kindly review and help me to merge my changes

---

[GitHub] carbondata issue #2805: [Documentation] Local dictionary Data which are not ...

In reply to this post by qiuchenjian-2

Github user sgururajshetty commented on the issue:

https://github.com/apache/carbondata/pull/2805

@sraghunandan kindly review

---

12