Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2987: [CARBONDATA-3167] Add a example for DataFrame...

Classic

List

20 messages Options

Options

[GitHub] carbondata pull request #2987: [CARBONDATA-3167] Add a example for DataFrame...

GitHub user xubo245 opened a pull request:

https://github.com/apache/carbondata/pull/2987

[CARBONDATA-3167] Add a example for DataFrame read/write non-transactional table from/to S3

Add a example for DataFrame read/write non-transactional table from/to S3

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

- [ ] Any interfaces changed?
No
- [ ] Any backward compatibility impacted?
No
- [ ] Document update required?
No
- [ ] Testing done
add test case
- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

No

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xubo245/carbondata CARBONDATA-3167_DataFrameSDKS3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2987.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2987

----
commit 81839a0846e08ff6dad28f88bb36fa1a75ed202b
Author: xubo245 <xubo29@...>
Date: 2018-12-13T16:11:30Z

[CARBONDATA-3167] Add a example for DataFrame read/write non-transactional table from/to S3

----

---

[GitHub] carbondata issue #2987: [CARBONDATA-3167] Add a example for DataFrame read/w...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2987

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1742/

---

[GitHub] carbondata issue #2987: [CARBONDATA-3167] Add a example for DataFrame read/w...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2987

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1953/

---

[GitHub] carbondata issue #2987: [CARBONDATA-3167] Add a example for DataFrame read/w...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2987

Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10001/

---

[GitHub] carbondata pull request #2987: [CARBONDATA-3167] Add a example for DataFrame...

In reply to this post by qiuchenjian-2

Github user qiuchenjian commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2987#discussion_r241619313

--- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/DataFrameSDKS3Example.scala ---
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import java.io.File
+
+import org.apache.commons.io.FileUtils
+import org.apache.hadoop.fs.s3a.Constants
+import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
+import org.apache.spark.sql.types._
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datamap.DataMapStoreManager
+
+/**
+ * This example show DataFrame How to read/Write SDK data from/to S3
+ */
+object DataFrameSDKS3Example {
+
+ val rootPath = new File(this.getClass.getResource("/").getPath
+ + "../../../..").getCanonicalPath
+
+ def main(args: Array[String]): Unit = {
--- End diff --

What the purpose of main, I think test case don't use main

---

[GitHub] carbondata pull request #2987: [CARBONDATA-3167] Add a example for DataFrame...

In reply to this post by qiuchenjian-2

Github user qiuchenjian commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2987#discussion_r241619136

--- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/DataFrameSDKS3Example.scala ---
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import java.io.File
+
+import org.apache.commons.io.FileUtils
+import org.apache.hadoop.fs.s3a.Constants
+import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
+import org.apache.spark.sql.types._
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datamap.DataMapStoreManager
+
+/**
+ * This example show DataFrame How to read/Write SDK data from/to S3
+ */
+object DataFrameSDKS3Example {
+
+ val rootPath = new File(this.getClass.getResource("/").getPath
+ + "../../../..").getCanonicalPath
+
+ def main(args: Array[String]): Unit = {
+ if (args.length > 4 || args.length == 2) {
+ val LOGGER = LogServiceFactory.getLogService(classOf[DataMapStoreManager].getName)
+ LOGGER.error("If you want to use S3, Please input parameters:" +
+ " <access-key> <secret-key> <s3-endpoint> [table-path-on-s3];" +
+ "If you want to run in local, please use default to input local path")
+ System.exit(0)
+ }
+ val warehouse = s"$rootPath/examples/spark2/target/warehouse"
+
+ val sparkSession = SparkSession
+ .builder()
+ .master("local")
+ .appName("SparkSessionExample")
+ .config("spark.sql.warehouse.dir", warehouse)
+ .getOrCreate()
+
+ sparkSession.sparkContext.setLogLevel("ERROR")
+
+ exampleBody(sparkSession, args)
+
+ sparkSession.stop()
+ }
+
+ def exampleBody(sparkSession: SparkSession, args: Array[String] = Array.empty): Unit = {
+
+ try {
+ val df = sparkSession.emptyDataFrame
+
+ var path = s"$rootPath/examples/spark2/target/carbon"
+ if (args.length == 1) {
+ path = args(0)
+ }
+ if (args.length == 3) {
+ path = "s3a://xubo/sdk/DFTest"
--- End diff --

I think the path of test case shouldn't contain personal information, you can instead it with function name

---

[GitHub] carbondata issue #2987: [CARBONDATA-3167] Add a example for DataFrame read/w...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2987

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1743/

---

[GitHub] carbondata pull request #2987: [CARBONDATA-3167] Add a example for DataFrame...

In reply to this post by qiuchenjian-2

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2987#discussion_r241622327

--- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/DataFrameSDKS3Example.scala ---
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import java.io.File
+
+import org.apache.commons.io.FileUtils
+import org.apache.hadoop.fs.s3a.Constants
+import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
+import org.apache.spark.sql.types._
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datamap.DataMapStoreManager
+
+/**
+ * This example show DataFrame How to read/Write SDK data from/to S3
+ */
+object DataFrameSDKS3Example {
+
+ val rootPath = new File(this.getClass.getResource("/").getPath
+ + "../../../..").getCanonicalPath
+
+ def main(args: Array[String]): Unit = {
--- End diff --

This is a example, please check other example, like: org.apache.carbondata.examples.CarbonSessionExample#main

---

[GitHub] carbondata pull request #2987: [CARBONDATA-3167] Add a example for DataFrame...

In reply to this post by qiuchenjian-2

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2987#discussion_r241622359

--- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/DataFrameSDKS3Example.scala ---
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import java.io.File
+
+import org.apache.commons.io.FileUtils
+import org.apache.hadoop.fs.s3a.Constants
+import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
+import org.apache.spark.sql.types._
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datamap.DataMapStoreManager
+
+/**
+ * This example show DataFrame How to read/Write SDK data from/to S3
+ */
+object DataFrameSDKS3Example {
+
+ val rootPath = new File(this.getClass.getResource("/").getPath
+ + "../../../..").getCanonicalPath
+
+ def main(args: Array[String]): Unit = {
+ if (args.length > 4 || args.length == 2) {
+ val LOGGER = LogServiceFactory.getLogService(classOf[DataMapStoreManager].getName)
+ LOGGER.error("If you want to use S3, Please input parameters:" +
+ " <access-key> <secret-key> <s3-endpoint> [table-path-on-s3];" +
+ "If you want to run in local, please use default to input local path")
+ System.exit(0)
+ }
+ val warehouse = s"$rootPath/examples/spark2/target/warehouse"
+
+ val sparkSession = SparkSession
+ .builder()
+ .master("local")
+ .appName("SparkSessionExample")
+ .config("spark.sql.warehouse.dir", warehouse)
+ .getOrCreate()
+
+ sparkSession.sparkContext.setLogLevel("ERROR")
+
+ exampleBody(sparkSession, args)
+
+ sparkSession.stop()
+ }
+
+ def exampleBody(sparkSession: SparkSession, args: Array[String] = Array.empty): Unit = {
+
+ try {
+ val df = sparkSession.emptyDataFrame
+
+ var path = s"$rootPath/examples/spark2/target/carbon"
+ if (args.length == 1) {
+ path = args(0)
+ }
+ if (args.length == 3) {
+ path = "s3a://xubo/sdk/DFTest"
--- End diff --

done.

---

[GitHub] carbondata issue #2987: [CARBONDATA-3167] Add a example for DataFrame read/w...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2987

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1744/

---

[GitHub] carbondata issue #2987: [CARBONDATA-3167] Add a example for DataFrame read/w...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2987

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10003/

---

[GitHub] carbondata issue #2987: [CARBONDATA-3167] Add a example for DataFrame read/w...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2987

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1955/

---

[GitHub] carbondata issue #2987: [CARBONDATA-3167] Add a example for DataFrame read/w...

In reply to this post by qiuchenjian-2

Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2987

@jackylk @QiangCai @KanakaKumar @kunal642 Please review it.

---

[GitHub] carbondata pull request #2987: [CARBONDATA-3167] Add a example for DataFrame...

In reply to this post by qiuchenjian-2

Github user BJangir commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2987#discussion_r243167366

--- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/DataFrameSDKS3Example.scala ---
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import java.io.File
+
+import org.apache.commons.io.FileUtils
+import org.apache.hadoop.fs.s3a.Constants
+import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
+import org.apache.spark.sql.types._
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datamap.DataMapStoreManager
+
+/**
+ * This example show DataFrame How to read/Write SDK data from/to S3
+ */
+object DataFrameSDKS3Example {
+
+ val rootPath = new File(this.getClass.getResource("/").getPath
+ + "../../../..").getCanonicalPath
+
+ def main(args: Array[String]): Unit = {
+ if (args.length > 4 || args.length == 2) {
+ val LOGGER = LogServiceFactory.getLogService(classOf[DataMapStoreManager].getName)
+ LOGGER.error("If you want to use S3, Please input parameters:" +
+ " <access-key> <secret-key> <s3-endpoint> [table-path-on-s3];" +
+ "If you want to run in local, please use default to input local path")
+ System.exit(0)
+ }
+ val warehouse = s"$rootPath/examples/spark2/target/warehouse"
+
+ val sparkSession = SparkSession
+ .builder()
+ .master("local")
+ .appName("SparkSessionExample")
+ .config("spark.sql.warehouse.dir", warehouse)
+ .getOrCreate()
+
+ sparkSession.sparkContext.setLogLevel("ERROR")
+
+ exampleBody(sparkSession, args)
+
+ sparkSession.stop()
+ }
+
+ def exampleBody(sparkSession: SparkSession, args: Array[String] = Array.empty): Unit = {
+
+ try {
+ val df = sparkSession.emptyDataFrame
+
+ var path = s"$rootPath/examples/spark2/target/carbon"
+ if (args.length == 1) {
+ path = args(0)
+ }
+ if (args.length == 3) {
+ path = "s3a://carbon/sdk/DFTest"
+ }
+ if (args.length > 3) {
+ path = args(3)
+ }
+
+ write(df, path, args);
+ read(df, path, args);
+ } catch {
+ case e: Exception => assert(false)
+ }
+ }
+
+ /**
+ * inherit DataFrame from other place,
+ * it need create CarbonSession for read CarbonData from S3
+ *
+ * @param df DataFrame, including SparkConf
+ * @param path read path
+ * @param args argument, including ak, sk, endpoint
+ */
+ def read(df: DataFrame, path: String, args: Array[String]): Unit = {
+ val carbonSession = DataFrameToCarbonSession(df, path, args, 3);
+
+ val result = carbonSession
+ .read
+ .format("carbon")
+ .load(path)
+ result.show()
+ result.foreach { each =>
+ assert(each.get(0).toString.contains("city"))
+ }
+ carbonSession.stop()
+ }
+
+ /**
+ * inherit DataFrame from other place,
+ * it need create CarbonSession for write CarbonData to S3
+ *
+ * @param df DataFrame, including SparkConf
+ * @param path write path
+ * @param args argument, including ak, sk, endpoint
+ */
+ def write(df: DataFrame, path: String, args: Array[String]): Unit = {
+ val carbonSession = DataFrameToCarbonSession(df, path, args, 4);
+
+ val rdd = carbonSession.sqlContext.sparkContext
+ .parallelize(1 to 1200, 4)
+ .map { x =>
+ ("city" + x % 8, "country" + x % 1103, "planet" + x % 10007, x.toString,
+ (x % 16).toShort, x / 2, (x << 1).toLong, x.toDouble / 13, x.toDouble / 11)
+ }.map { x =>
+ Row(x._1, x._2, x._3, x._4, x._5, x._6, x._7, x._8, x._9)
+ }
+
+ val schema = StructType(
+ Seq(
+ StructField("city", StringType, nullable = false),
+ StructField("country", StringType, nullable = false),
+ StructField("planet", StringType, nullable = false),
+ StructField("id", StringType, nullable = false),
+ StructField("m1", ShortType, nullable = false),
+ StructField("m2", IntegerType, nullable = false),
+ StructField("m3", LongType, nullable = false),
+ StructField("m4", DoubleType, nullable = false),
+ StructField("m5", DoubleType, nullable = false)
+ )
+ )
+
+ carbonSession.createDataFrame(rdd, schema)
--- End diff --

@xubo245 ,Please try rdd.toDF directly and pass columns name in selectExpr . For this you need to import carbonsession.implicits._ .

---

[GitHub] carbondata issue #2987: [CARBONDATA-3167] Add a example for DataFrame read/w...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2987

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1908/

---

[GitHub] carbondata issue #2987: [CARBONDATA-3167] Add a example for DataFrame read/w...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2987

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10162/

---

[GitHub] carbondata issue #2987: [CARBONDATA-3167] Add a example for DataFrame read/w...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2987

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2119/

---

[GitHub] carbondata issue #2987: [CARBONDATA-3167] Add a example for DataFrame read/w...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2987

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1966/

---

[GitHub] carbondata issue #2987: [CARBONDATA-3167] Add a example for DataFrame read/w...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2987

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10218/

---

[GitHub] carbondata issue #2987: [CARBONDATA-3167] Add a example for DataFrame read/w...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2987

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2239/

---