GitHub user xubo245 opened a pull request:
https://github.com/apache/carbondata/pull/2987 [CARBONDATA-3167] Add a example for DataFrame read/write non-transactional table from/to S3 Add a example for DataFrame read/write non-transactional table from/to S3 Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? No - [ ] Testing done add test case - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. No You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/carbondata CARBONDATA-3167_DataFrameSDKS3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2987.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2987 ---- commit 81839a0846e08ff6dad28f88bb36fa1a75ed202b Author: xubo245 <xubo29@...> Date: 2018-12-13T16:11:30Z [CARBONDATA-3167] Add a example for DataFrame read/write non-transactional table from/to S3 ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2987 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1742/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2987 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1953/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2987 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10001/ --- |
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2987#discussion_r241619313 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/DataFrameSDKS3Example.scala --- @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.examples + +import java.io.File + +import org.apache.commons.io.FileUtils +import org.apache.hadoop.fs.s3a.Constants +import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession} +import org.apache.spark.sql.types._ + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datamap.DataMapStoreManager + +/** + * This example show DataFrame How to read/Write SDK data from/to S3 + */ +object DataFrameSDKS3Example { + + val rootPath = new File(this.getClass.getResource("/").getPath + + "../../../..").getCanonicalPath + + def main(args: Array[String]): Unit = { --- End diff -- What the purpose of main, I think test case don't use main --- |
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2987#discussion_r241619136 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/DataFrameSDKS3Example.scala --- @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.examples + +import java.io.File + +import org.apache.commons.io.FileUtils +import org.apache.hadoop.fs.s3a.Constants +import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession} +import org.apache.spark.sql.types._ + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datamap.DataMapStoreManager + +/** + * This example show DataFrame How to read/Write SDK data from/to S3 + */ +object DataFrameSDKS3Example { + + val rootPath = new File(this.getClass.getResource("/").getPath + + "../../../..").getCanonicalPath + + def main(args: Array[String]): Unit = { + if (args.length > 4 || args.length == 2) { + val LOGGER = LogServiceFactory.getLogService(classOf[DataMapStoreManager].getName) + LOGGER.error("If you want to use S3, Please input parameters:" + + " <access-key> <secret-key> <s3-endpoint> [table-path-on-s3];" + + "If you want to run in local, please use default to input local path") + System.exit(0) + } + val warehouse = s"$rootPath/examples/spark2/target/warehouse" + + val sparkSession = SparkSession + .builder() + .master("local") + .appName("SparkSessionExample") + .config("spark.sql.warehouse.dir", warehouse) + .getOrCreate() + + sparkSession.sparkContext.setLogLevel("ERROR") + + exampleBody(sparkSession, args) + + sparkSession.stop() + } + + def exampleBody(sparkSession: SparkSession, args: Array[String] = Array.empty): Unit = { + + try { + val df = sparkSession.emptyDataFrame + + var path = s"$rootPath/examples/spark2/target/carbon" + if (args.length == 1) { + path = args(0) + } + if (args.length == 3) { + path = "s3a://xubo/sdk/DFTest" --- End diff -- I think the path of test case shouldn't contain personal information, you can instead it with function name --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2987 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1743/ --- |
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2987#discussion_r241622327 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/DataFrameSDKS3Example.scala --- @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.examples + +import java.io.File + +import org.apache.commons.io.FileUtils +import org.apache.hadoop.fs.s3a.Constants +import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession} +import org.apache.spark.sql.types._ + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datamap.DataMapStoreManager + +/** + * This example show DataFrame How to read/Write SDK data from/to S3 + */ +object DataFrameSDKS3Example { + + val rootPath = new File(this.getClass.getResource("/").getPath + + "../../../..").getCanonicalPath + + def main(args: Array[String]): Unit = { --- End diff -- This is a example, please check other example, like: org.apache.carbondata.examples.CarbonSessionExample#main --- |
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2987#discussion_r241622359 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/DataFrameSDKS3Example.scala --- @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.examples + +import java.io.File + +import org.apache.commons.io.FileUtils +import org.apache.hadoop.fs.s3a.Constants +import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession} +import org.apache.spark.sql.types._ + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datamap.DataMapStoreManager + +/** + * This example show DataFrame How to read/Write SDK data from/to S3 + */ +object DataFrameSDKS3Example { + + val rootPath = new File(this.getClass.getResource("/").getPath + + "../../../..").getCanonicalPath + + def main(args: Array[String]): Unit = { + if (args.length > 4 || args.length == 2) { + val LOGGER = LogServiceFactory.getLogService(classOf[DataMapStoreManager].getName) + LOGGER.error("If you want to use S3, Please input parameters:" + + " <access-key> <secret-key> <s3-endpoint> [table-path-on-s3];" + + "If you want to run in local, please use default to input local path") + System.exit(0) + } + val warehouse = s"$rootPath/examples/spark2/target/warehouse" + + val sparkSession = SparkSession + .builder() + .master("local") + .appName("SparkSessionExample") + .config("spark.sql.warehouse.dir", warehouse) + .getOrCreate() + + sparkSession.sparkContext.setLogLevel("ERROR") + + exampleBody(sparkSession, args) + + sparkSession.stop() + } + + def exampleBody(sparkSession: SparkSession, args: Array[String] = Array.empty): Unit = { + + try { + val df = sparkSession.emptyDataFrame + + var path = s"$rootPath/examples/spark2/target/carbon" + if (args.length == 1) { + path = args(0) + } + if (args.length == 3) { + path = "s3a://xubo/sdk/DFTest" --- End diff -- done. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2987 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1744/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2987 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10003/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2987 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1955/ --- |
In reply to this post by qiuchenjian-2
Github user xubo245 commented on the issue:
https://github.com/apache/carbondata/pull/2987 @jackylk @QiangCai @KanakaKumar @kunal642 Please review it. --- |
In reply to this post by qiuchenjian-2
Github user BJangir commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2987#discussion_r243167366 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/DataFrameSDKS3Example.scala --- @@ -0,0 +1,185 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.examples + +import java.io.File + +import org.apache.commons.io.FileUtils +import org.apache.hadoop.fs.s3a.Constants +import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession} +import org.apache.spark.sql.types._ + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datamap.DataMapStoreManager + +/** + * This example show DataFrame How to read/Write SDK data from/to S3 + */ +object DataFrameSDKS3Example { + + val rootPath = new File(this.getClass.getResource("/").getPath + + "../../../..").getCanonicalPath + + def main(args: Array[String]): Unit = { + if (args.length > 4 || args.length == 2) { + val LOGGER = LogServiceFactory.getLogService(classOf[DataMapStoreManager].getName) + LOGGER.error("If you want to use S3, Please input parameters:" + + " <access-key> <secret-key> <s3-endpoint> [table-path-on-s3];" + + "If you want to run in local, please use default to input local path") + System.exit(0) + } + val warehouse = s"$rootPath/examples/spark2/target/warehouse" + + val sparkSession = SparkSession + .builder() + .master("local") + .appName("SparkSessionExample") + .config("spark.sql.warehouse.dir", warehouse) + .getOrCreate() + + sparkSession.sparkContext.setLogLevel("ERROR") + + exampleBody(sparkSession, args) + + sparkSession.stop() + } + + def exampleBody(sparkSession: SparkSession, args: Array[String] = Array.empty): Unit = { + + try { + val df = sparkSession.emptyDataFrame + + var path = s"$rootPath/examples/spark2/target/carbon" + if (args.length == 1) { + path = args(0) + } + if (args.length == 3) { + path = "s3a://carbon/sdk/DFTest" + } + if (args.length > 3) { + path = args(3) + } + + write(df, path, args); + read(df, path, args); + } catch { + case e: Exception => assert(false) + } + } + + /** + * inherit DataFrame from other place, + * it need create CarbonSession for read CarbonData from S3 + * + * @param df DataFrame, including SparkConf + * @param path read path + * @param args argument, including ak, sk, endpoint + */ + def read(df: DataFrame, path: String, args: Array[String]): Unit = { + val carbonSession = DataFrameToCarbonSession(df, path, args, 3); + + val result = carbonSession + .read + .format("carbon") + .load(path) + result.show() + result.foreach { each => + assert(each.get(0).toString.contains("city")) + } + carbonSession.stop() + } + + /** + * inherit DataFrame from other place, + * it need create CarbonSession for write CarbonData to S3 + * + * @param df DataFrame, including SparkConf + * @param path write path + * @param args argument, including ak, sk, endpoint + */ + def write(df: DataFrame, path: String, args: Array[String]): Unit = { + val carbonSession = DataFrameToCarbonSession(df, path, args, 4); + + val rdd = carbonSession.sqlContext.sparkContext + .parallelize(1 to 1200, 4) + .map { x => + ("city" + x % 8, "country" + x % 1103, "planet" + x % 10007, x.toString, + (x % 16).toShort, x / 2, (x << 1).toLong, x.toDouble / 13, x.toDouble / 11) + }.map { x => + Row(x._1, x._2, x._3, x._4, x._5, x._6, x._7, x._8, x._9) + } + + val schema = StructType( + Seq( + StructField("city", StringType, nullable = false), + StructField("country", StringType, nullable = false), + StructField("planet", StringType, nullable = false), + StructField("id", StringType, nullable = false), + StructField("m1", ShortType, nullable = false), + StructField("m2", IntegerType, nullable = false), + StructField("m3", LongType, nullable = false), + StructField("m4", DoubleType, nullable = false), + StructField("m5", DoubleType, nullable = false) + ) + ) + + carbonSession.createDataFrame(rdd, schema) --- End diff -- @xubo245 ,Please try rdd.toDF directly and pass columns name in selectExpr . For this you need to import carbonsession.implicits._ . --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2987 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1908/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2987 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10162/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2987 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2119/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2987 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1966/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2987 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10218/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2987 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2239/ --- |
Free forum by Nabble | Edit this page |