Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Optimize carbonData using a...

Classic

List

Threaded

42 messages Options

123

qiuchenjian-2

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3054

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2426/

---

qiuchenjian-2

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

In reply to this post by qiuchenjian-2

qiuchenjian-2

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3054

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10467/

---

qiuchenjian-2

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3054

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2212/

---

qiuchenjian-2

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3054#discussion_r246044066

--- Diff: docs/alluxio-guide.md ---
@@ -0,0 +1,42 @@
+
+
+
+# Presto guide
--- End diff --

presto?

---

qiuchenjian-2

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3054#discussion_r246047276

--- Diff: docs/documentation.md ---
@@ -29,15 +29,15 @@ Apache CarbonData is a new big data file format for faster interactive query usi

**Quick Start:** [Run an example program](./quick-start-guide.md#installing-and-configuring-carbondata-to-run-locally-with-spark-shell) on your local machine or [study some examples](https://github.com/apache/carbondata/tree/master/examples/spark2/src/main/scala/org/apache/carbondata/examples).

-**CarbonData SQL Language Reference:** CarbonData extends the Spark SQL language and adds several [DDL](./ddl-of-carbondata.md) and [DML](./dml-of-carbondata.md) statements to support operations on it.Refer to the [Reference Manual](./language-manual.md) to understand the supported features and functions.
+**CarbonData SQL Language Reference:** CarbonData extends the Spark SQL language and adds several [DDL](./ddl-of-carbondata.md) and [DML](./dml-of-carbondata.md) statements to support operations on it. Refer to the [Reference Manual](./language-manual.md) to understand the supported features and functions.

**Programming Guides:** You can read our guides about [Java APIs supported](./sdk-guide.md) or [C++ APIs supported](./csdk-guide.md) to learn how to integrate CarbonData with your applications.

## Integration

-CarbonData can be integrated with popular Execution engines like [Spark](./quick-start-guide.md#spark) , [Presto](./quick-start-guide.md#presto) and [Hive](./quick-start-guide.md#hive).Refer to the [Installation and Configuration](./quick-start-guide.md#integration) section to understand all modes of Integrating CarbonData.
+CarbonData can be integrated with popular Execution engines like [Spark](./quick-start-guide.md#spark) , [Presto](./quick-start-guide.md#presto) and [Hive](./quick-start-guide.md#hive). CarbonData also supports read and write with [Alluxio](./quick-start-guide.md#alluxio). Refer to the [Installation and Configuration](./quick-start-guide.md#integration) section to understand all modes of Integrating CarbonData.
--- End diff --

I think it's not proper to mention Alluxio after e(*Not E*)xecution engines like SparkSQL/Presto/Hive.

Meanwhile we can add another paragraph and mention CarbonData can integrate with other storage engines such as HDFS, S3, OBS, Alluxio.

@chenliang613 How do you think about it?

---

qiuchenjian-2

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3054#discussion_r246047576

--- Diff: docs/quick-start-guide.md ---
@@ -54,7 +54,8 @@ CarbonData can be integrated with Spark,Presto and Hive Execution Engines. The b
### Hive
[Installing and Configuring CarbonData on Hive](https://github.com/apache/carbondata/blob/master/docs/hive-guide.md)

-
+### Alluxio
--- End diff --

As mentioned above, we may need to adjust the location for this section.

---

qiuchenjian-2

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3054#discussion_r246049322

--- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/AlluxioExample.scala ---
@@ -28,46 +33,86 @@ import org.apache.carbondata.examples.util.ExampleUtils
/**
* configure alluxio:
* 1.start alluxio
- * 2.upload the jar :"/alluxio_path/core/client/target/
- * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar"
- * 3.Get more detail at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html
+ * 2.Get more detail at: https://www.alluxio.org/docs/1.8/en/compute/Spark.html
*/
-
object AlluxioExample {
- def main(args: Array[String]) {
- val spark = ExampleUtils.createCarbonSession("AlluxioExample")
- exampleBody(spark)
- spark.close()
+ def main (args: Array[String]) {
+ val carbon = ExampleUtils.createCarbonSession("AlluxioExample",
+ storePath = "alluxio://localhost:19998/carbondata")
+ exampleBody(carbon)
+ carbon.close()
}

- def exampleBody(spark : SparkSession): Unit = {
+ def exampleBody (spark: SparkSession): Unit = {
+ val rootPath = new File(this.getClass.getResource("/").getPath
+ + "../../../..").getCanonicalPath
spark.sparkContext.hadoopConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem")
FileFactory.getConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem")

// Specify date format based on raw data
CarbonProperties.getInstance()
.addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")

- spark.sql("DROP TABLE IF EXISTS alluxio_table")
+ val time = new SimpleDateFormat("yyyyMMddHHmmssSSS").format(new Date())
+
+ val mFsShell = new FileSystemShell()
+ val localFile = rootPath + "/hadoop/src/test/resources/data.csv"
+ val remotePath = "/carbon_alluxio" + time + ".csv"
+ val remoteFile = "alluxio://localhost:19998/carbon_alluxio" + time + ".csv"
--- End diff --

use 'prefix + remotePath' instead of concating the path by hand

---

qiuchenjian-2

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3054#discussion_r246050916

--- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/AlluxioExample.scala ---
@@ -28,46 +33,86 @@ import org.apache.carbondata.examples.util.ExampleUtils
/**
* configure alluxio:
* 1.start alluxio
- * 2.upload the jar :"/alluxio_path/core/client/target/
- * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar"
- * 3.Get more detail at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html
+ * 2.Get more detail at: https://www.alluxio.org/docs/1.8/en/compute/Spark.html
*/
-
object AlluxioExample {
- def main(args: Array[String]) {
- val spark = ExampleUtils.createCarbonSession("AlluxioExample")
- exampleBody(spark)
- spark.close()
+ def main (args: Array[String]) {
+ val carbon = ExampleUtils.createCarbonSession("AlluxioExample",
+ storePath = "alluxio://localhost:19998/carbondata")
+ exampleBody(carbon)
+ carbon.close()
}

- def exampleBody(spark : SparkSession): Unit = {
+ def exampleBody (spark: SparkSession): Unit = {
+ val rootPath = new File(this.getClass.getResource("/").getPath
+ + "../../../..").getCanonicalPath
spark.sparkContext.hadoopConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem")
--- End diff --

Only providing an example for dataframe is not enough. Seems we should add some configurations in carbon property file and spark properties to make it work through beeline. So we can make it clear in case the user want to try it from beeline.

---

qiuchenjian-2

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

In reply to this post by qiuchenjian-2

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3054#discussion_r246249301

--- Diff: docs/alluxio-guide.md ---
@@ -0,0 +1,42 @@
+
+
+
+# Presto guide
--- End diff --

changed

---

qiuchenjian-2

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

In reply to this post by qiuchenjian-2

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3054#discussion_r246249496

--- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/AlluxioExample.scala ---
@@ -28,46 +33,86 @@ import org.apache.carbondata.examples.util.ExampleUtils
/**
* configure alluxio:
* 1.start alluxio
- * 2.upload the jar :"/alluxio_path/core/client/target/
- * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar"
- * 3.Get more detail at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html
+ * 2.Get more detail at: https://www.alluxio.org/docs/1.8/en/compute/Spark.html
*/
-
object AlluxioExample {
- def main(args: Array[String]) {
- val spark = ExampleUtils.createCarbonSession("AlluxioExample")
- exampleBody(spark)
- spark.close()
+ def main (args: Array[String]) {
+ val carbon = ExampleUtils.createCarbonSession("AlluxioExample",
+ storePath = "alluxio://localhost:19998/carbondata")
+ exampleBody(carbon)
+ carbon.close()
}

- def exampleBody(spark : SparkSession): Unit = {
+ def exampleBody (spark: SparkSession): Unit = {
+ val rootPath = new File(this.getClass.getResource("/").getPath
+ + "../../../..").getCanonicalPath
spark.sparkContext.hadoopConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem")
FileFactory.getConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem")

// Specify date format based on raw data
CarbonProperties.getInstance()
.addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")

- spark.sql("DROP TABLE IF EXISTS alluxio_table")
+ val time = new SimpleDateFormat("yyyyMMddHHmmssSSS").format(new Date())
+
+ val mFsShell = new FileSystemShell()
+ val localFile = rootPath + "/hadoop/src/test/resources/data.csv"
+ val remotePath = "/carbon_alluxio" + time + ".csv"
+ val remoteFile = "alluxio://localhost:19998/carbon_alluxio" + time + ".csv"
--- End diff --

ok

---

qiuchenjian-2

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

In reply to this post by qiuchenjian-2

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3054#discussion_r246280098

--- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/AlluxioExample.scala ---
@@ -28,46 +33,86 @@ import org.apache.carbondata.examples.util.ExampleUtils
/**
* configure alluxio:
* 1.start alluxio
- * 2.upload the jar :"/alluxio_path/core/client/target/
- * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar"
- * 3.Get more detail at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html
+ * 2.Get more detail at: https://www.alluxio.org/docs/1.8/en/compute/Spark.html
*/
-
object AlluxioExample {
- def main(args: Array[String]) {
- val spark = ExampleUtils.createCarbonSession("AlluxioExample")
- exampleBody(spark)
- spark.close()
+ def main (args: Array[String]) {
+ val carbon = ExampleUtils.createCarbonSession("AlluxioExample",
+ storePath = "alluxio://localhost:19998/carbondata")
+ exampleBody(carbon)
+ carbon.close()
}

- def exampleBody(spark : SparkSession): Unit = {
+ def exampleBody (spark: SparkSession): Unit = {
+ val rootPath = new File(this.getClass.getResource("/").getPath
+ + "../../../..").getCanonicalPath
spark.sparkContext.hadoopConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem")
--- End diff --

now Spark-shell and spark-submit is ok, but CarbonThriftServer and beeline still have some problem.

---

qiuchenjian-2

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

In reply to this post by qiuchenjian-2

qiuchenjian-2

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

In reply to this post by qiuchenjian-2

qiuchenjian-2

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

In reply to this post by qiuchenjian-2

qiuchenjian-2

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

In reply to this post by qiuchenjian-2

qiuchenjian-2

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

In reply to this post by qiuchenjian-2

qiuchenjian-2

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

In reply to this post by qiuchenjian-2

qiuchenjian-2

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3054#discussion_r246427391

--- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/AlluxioExample.scala ---
@@ -28,46 +33,86 @@ import org.apache.carbondata.examples.util.ExampleUtils
/**
* configure alluxio:
* 1.start alluxio
- * 2.upload the jar :"/alluxio_path/core/client/target/
- * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar"
- * 3.Get more detail at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html
+ * 2.Get more detail at: https://www.alluxio.org/docs/1.8/en/compute/Spark.html
*/
-
object AlluxioExample {
- def main(args: Array[String]) {
- val spark = ExampleUtils.createCarbonSession("AlluxioExample")
- exampleBody(spark)
- spark.close()
+ def main (args: Array[String]) {
+ val carbon = ExampleUtils.createCarbonSession("AlluxioExample",
+ storePath = "alluxio://localhost:19998/carbondata")
+ exampleBody(carbon)
+ carbon.close()
}

- def exampleBody(spark : SparkSession): Unit = {
+ def exampleBody (spark: SparkSession): Unit = {
+ val rootPath = new File(this.getClass.getResource("/").getPath
+ + "../../../..").getCanonicalPath
spark.sparkContext.hadoopConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem")
--- End diff --

So you need to mention this in the current document

---

qiuchenjian-2

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

In reply to this post by qiuchenjian-2

123