Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2756: [CARBONDATA-2966]Update Documentation For Avr...

Classic

List

25 messages Options

Options

12

[GitHub] carbondata pull request #2756: [CARBONDATA-2966]Update Documentation For Avr...

GitHub user Indhumathi27 opened a pull request:

https://github.com/apache/carbondata/pull/2756

[CARBONDATA-2966]Update Documentation For Avro DataType conversion

Updated document for Avro DataType conversion to carbon

- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?

- [ ] Testing done

- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Indhumathi27/carbondata doc_avro

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2756.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2756

----
commit f30d1489dbdbe3fecd9eacb96ddff3658904b691
Author: Indhumathi27 <indhumathim27@...>
Date: 2018-09-24T18:04:04Z

[CARBONDATA-2966]Update Documentation For Avro DataType conversion

----

---

[GitHub] carbondata issue #2756: [CARBONDATA-2966]Update Documentation For Avro DataT...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2756

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/459/

---

[GitHub] carbondata issue #2756: [CARBONDATA-2966]Update Documentation For Avro DataT...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2756

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8709/

---

[GitHub] carbondata issue #2756: [CARBONDATA-2966]Update Documentation For Avro DataT...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2756

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/639/

---

[GitHub] carbondata issue #2756: [CARBONDATA-2966]Update Documentation For Avro DataT...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2756

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/484/

---

[GitHub] carbondata issue #2756: [CARBONDATA-2966]Update Documentation For Avro DataT...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2756

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/666/

---

[GitHub] carbondata issue #2756: [CARBONDATA-2966]Update Documentation For Avro DataT...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2756

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8736/

---

[GitHub] carbondata issue #2756: [CARBONDATA-2966]Update Documentation For Avro DataT...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2756

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/491/

---

[GitHub] carbondata pull request #2756: [CARBONDATA-2966]Update Documentation For Avr...

In reply to this post by qiuchenjian-2

Github user sgururajshetty commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2756#discussion_r220427408

--- Diff: docs/configuration-parameters.md ---
@@ -42,6 +42,7 @@ This section provides the details of all the configurations required for the Car
| carbon.lock.type | LOCALLOCK | This configuration specifies the type of lock to be acquired during concurrent operations on table. There are following types of lock implementation: - LOCALLOCK: Lock is created on local file system as file. This lock is useful when only one spark driver (thrift server) runs on a machine and no other CarbonData spark application is launched concurrently. - HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when multiple CarbonData spark applications are launched and no ZooKeeper is running on cluster and HDFS supports file based locking. |
| carbon.lock.path | TABLEPATH | This configuration specifies the path where lock files have to be created. Recommended to configure zookeeper lock type or configure HDFS lock path(to this property) in case of S3 file system as locking is not feasible on S3. |
| carbon.unsafe.working.memory.in.mb | 512 | CarbonData supports storing data in off-heap memory for certain operations during data loading and query.This helps to avoid the Java GC and thereby improve the overall performance.The Minimum value recommeded is 512MB.Any value below this is reset to default value of 512MB.**NOTE:** The below formulas explain how to arrive at the off-heap size required.<u>Memory Required For Data Loading:</u>(*carbon.number.of.cores.while.loading*) * (Number of tables to load in parallel) * (*offheap.sort.chunk.size.inmb* + *carbon.blockletgroup.size.in.mb* + *carbon.blockletgroup.size.in.mb*/3.5 ). <u>Memory required for Query:</u>SPARK_EXECUTOR_INSTANCES * (*carbon.blockletgroup.size.in.mb* + *carbon.blockletgroup.size.in.mb* * 3.5) * spark.executor.cores |
+| carbon.unsafe.driver.working.memory.in.mb | 60% of JVM Heap Memory | CarbonData supports storing data in unsafe on-heap memory in driver for certain operations like insert into, query for loading datamap cache. The Minimum value recommended is 512MB. |
--- End diff --

Kindly follow the same for all the parameter description

---

[GitHub] carbondata pull request #2756: [CARBONDATA-2966]Update Documentation For Avr...

In reply to this post by qiuchenjian-2

Github user sgururajshetty commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2756#discussion_r220427375

--- Diff: docs/configuration-parameters.md ---
@@ -42,6 +42,7 @@ This section provides the details of all the configurations required for the Car
| carbon.lock.type | LOCALLOCK | This configuration specifies the type of lock to be acquired during concurrent operations on table. There are following types of lock implementation: - LOCALLOCK: Lock is created on local file system as file. This lock is useful when only one spark driver (thrift server) runs on a machine and no other CarbonData spark application is launched concurrently. - HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when multiple CarbonData spark applications are launched and no ZooKeeper is running on cluster and HDFS supports file based locking. |
| carbon.lock.path | TABLEPATH | This configuration specifies the path where lock files have to be created. Recommended to configure zookeeper lock type or configure HDFS lock path(to this property) in case of S3 file system as locking is not feasible on S3. |
| carbon.unsafe.working.memory.in.mb | 512 | CarbonData supports storing data in off-heap memory for certain operations during data loading and query.This helps to avoid the Java GC and thereby improve the overall performance.The Minimum value recommeded is 512MB.Any value below this is reset to default value of 512MB.**NOTE:** The below formulas explain how to arrive at the off-heap size required.<u>Memory Required For Data Loading:</u>(*carbon.number.of.cores.while.loading*) * (Number of tables to load in parallel) * (*offheap.sort.chunk.size.inmb* + *carbon.blockletgroup.size.in.mb* + *carbon.blockletgroup.size.in.mb*/3.5 ). <u>Memory required for Query:</u>SPARK_EXECUTOR_INSTANCES * (*carbon.blockletgroup.size.in.mb* + *carbon.blockletgroup.size.in.mb* * 3.5) * spark.executor.cores |
+| carbon.unsafe.driver.working.memory.in.mb | 60% of JVM Heap Memory | CarbonData supports storing data in unsafe on-heap memory in driver for certain operations like insert into, query for loading datamap cache. The Minimum value recommended is 512MB. |
--- End diff --

Parameter description should satisfy following questions:
a. What does this parameter do?
b. In what scenario the user needs to configure this parameter?
c. Is there any benefits by configuring these parameter?
d. What is the default value?
e. What is the value range if any?
f. Is there any limitations?
g. Any key information to be highlighted?

---

[GitHub] carbondata issue #2756: [CARBONDATA-2966]Update Documentation For Avro DataT...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2756

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/500/

---

[GitHub] carbondata pull request #2756: [CARBONDATA-2966]Update Documentation For Avr...

In reply to this post by qiuchenjian-2

Github user sgururajshetty commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2756#discussion_r220428630

--- Diff: docs/sdk-guide.md ---
@@ -181,22 +181,31 @@ public class TestSdkJson {
```

## Datatypes Mapping
-Each of SQL data types are mapped into data types of SDK. Following are the mapping:
+Each of SQL data types and Avro Data Types are mapped into data types of SDK. Following are the mapping:

-| SQL DataTypes | Mapped SDK DataTypes |
+| SQL DataTypes | Avro DataTypes | Mapped SDK DataTypes |
|---------------|----------------------|
--- End diff --

Check the formatting

---

[GitHub] carbondata pull request #2756: [CARBONDATA-2966]Update Documentation For Avr...

In reply to this post by qiuchenjian-2

Github user Indhumathi27 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2756#discussion_r220429435

--- Diff: docs/configuration-parameters.md ---
@@ -42,6 +42,7 @@ This section provides the details of all the configurations required for the Car
| carbon.lock.type | LOCALLOCK | This configuration specifies the type of lock to be acquired during concurrent operations on table. There are following types of lock implementation: - LOCALLOCK: Lock is created on local file system as file. This lock is useful when only one spark driver (thrift server) runs on a machine and no other CarbonData spark application is launched concurrently. - HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when multiple CarbonData spark applications are launched and no ZooKeeper is running on cluster and HDFS supports file based locking. |
| carbon.lock.path | TABLEPATH | This configuration specifies the path where lock files have to be created. Recommended to configure zookeeper lock type or configure HDFS lock path(to this property) in case of S3 file system as locking is not feasible on S3. |
| carbon.unsafe.working.memory.in.mb | 512 | CarbonData supports storing data in off-heap memory for certain operations during data loading and query.This helps to avoid the Java GC and thereby improve the overall performance.The Minimum value recommeded is 512MB.Any value below this is reset to default value of 512MB.**NOTE:** The below formulas explain how to arrive at the off-heap size required.<u>Memory Required For Data Loading:</u>(*carbon.number.of.cores.while.loading*) * (Number of tables to load in parallel) * (*offheap.sort.chunk.size.inmb* + *carbon.blockletgroup.size.in.mb* + *carbon.blockletgroup.size.in.mb*/3.5 ). <u>Memory required for Query:</u>SPARK_EXECUTOR_INSTANCES * (*carbon.blockletgroup.size.in.mb* + *carbon.blockletgroup.size.in.mb* * 3.5) * spark.executor.cores |
+| carbon.unsafe.driver.working.memory.in.mb | 60% of JVM Heap Memory | CarbonData supports storing data in unsafe on-heap memory in driver for certain operations like insert into, query for loading datamap cache. The Minimum value recommended is 512MB. |
--- End diff --

Okay. I think parameter description has covered all the questions which are applicable

---

[GitHub] carbondata issue #2756: [CARBONDATA-2966]Update Documentation For Avro DataT...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2756

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/504/

---

[GitHub] carbondata issue #2756: [CARBONDATA-2966]Update Documentation For Avro DataT...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2756

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8751/

---

[GitHub] carbondata issue #2756: [CARBONDATA-2966]Update Documentation For Avro DataT...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2756

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/682/

---

[GitHub] carbondata issue #2756: [CARBONDATA-2966]Update Documentation For Avro DataT...

In reply to this post by qiuchenjian-2

Github user Indhumathi27 commented on the issue:

https://github.com/apache/carbondata/pull/2756

Retest this please

---

[GitHub] carbondata issue #2756: [CARBONDATA-2966]Update Documentation For Avro DataT...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2756

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/508/

---

[GitHub] carbondata issue #2756: [CARBONDATA-2966]Update Documentation For Avro DataT...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2756

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/686/

---

[GitHub] carbondata issue #2756: [CARBONDATA-2966]Update Documentation For Avro DataT...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2756

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8755/

---

12