Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #1584: [CARBONDATA-1827] Added S3 implementation and...

Classic

List

89 messages Options

Options

12345

[GitHub] carbondata pull request #1584: [CARBONDATA-1827] Added S3 Implementation

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1584#discussion_r161362393

--- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
@@ -0,0 +1,152 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.examples
+
+import java.io.File
+
+import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
+import org.apache.spark.sql.SparkSession
+import org.slf4j.{Logger, LoggerFactory}
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+object S3Example {
+
+ /**
+ * This example demonstrate usage of s3 as a store.
+ *
+ * @param args require three parameters "Access-key" "Secret-key"
+ * "s3 bucket path"
+ */
+
+ def main(args: Array[String]) {
+ val rootPath = new File(this.getClass.getResource("/").getPath
+ + "../../../..").getCanonicalPath
+ val warehouse = s"$rootPath/examples/spark2/target/warehouse"
+ val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
+ val logger: Logger = LoggerFactory.getLogger(this.getClass)
+ CarbonProperties.getInstance()
+ .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss")
--- End diff --

Please do the same for other 3 examples also

---

[GitHub] carbondata pull request #1584: [CARBONDATA-1827] Added S3 Implementation

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1584#discussion_r161362429

--- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/MultiStoreExample.scala ---
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.examples
+
+import java.io.File
+
+import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
+import org.apache.spark.sql.SparkSession
+import org.slf4j.{Logger, LoggerFactory}
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+object MultiStoreExample {
+
+ /** This example demonstrate the usage of multiple filesystem(s3 and local) on one carbon session
+ *
+ * @param args represents "fs.s3a.access.key" "fs.s3a.secret.key" "bucket-name"
+ */
+
+ def main(args: Array[String]) {
+
+ val rootPath = new File(this.getClass.getResource("/").getPath
+ + "../../../..").getCanonicalPath
+ val storeLocation = s"$rootPath/examples/spark2/target/store"
+ val warehouse = s"$rootPath/examples/spark2/target/warehouse"
+ val metastoredb = s"$rootPath/examples/spark2/target"
+ val path = s"$rootPath/examples/spark2/src/main/resources/data.csv"
+ val logger: Logger = LoggerFactory.getLogger(this.getClass)
+
+ if (args.length != 3) {
+ logger.error("Usage: java CarbonS3Example <fs.s3a.access.key> <fs.s3a.secret" +
+ ".key> <bucket-name>")
+ System.exit(0)
+ }
+
+ CarbonProperties.getInstance()
+ .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss")
+ .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
+ .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true")
+
+ import org.apache.spark.sql.CarbonSession._
+ val spark = SparkSession
+ .builder()
+ .master("local")
+ .appName("CarbonSessionExample")
+ .config("spark.sql.warehouse.dir", warehouse)
+ .config("spark.driver.host", "localhost")
+ .config("spark.hadoop." + ACCESS_KEY, args(0))
+ .config("spark.hadoop." + SECRET_KEY, args(1))
+ .getOrCreateCarbonSession(storeLocation, warehouse)
--- End diff --

do not give store location, use `CREATE TABLE ... LOCATION ...` to determine the location for each table

---

[GitHub] carbondata issue #1584: [CARBONDATA-1827] Added S3 Implementation

In reply to this post by qiuchenjian-2

Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/1584

Please change the merge target branch as carbonstore, since master currently has been freezed for 1.3.0 version release.

---

[GitHub] carbondata issue #1584: [CARBONDATA-1827] Added S3 Implementation

In reply to this post by qiuchenjian-2

Github user SangeetaGulia commented on the issue:

https://github.com/apache/carbondata/pull/1584

@jackylk we have raised a new PR with all the review comments resolved [#1805](https://github.com/apache/carbondata/pull/1805). That PR is raised to be merged into the carbonstore branch.

Please review the same.

---

[GitHub] carbondata pull request #1584: [CARBONDATA-1827] Added S3 Implementation

In reply to this post by qiuchenjian-2

Github user SangeetaGulia commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1584#discussion_r161512369

--- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/MultiStoreExample.scala ---
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.examples
+
+import java.io.File
+
+import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
+import org.apache.spark.sql.SparkSession
+import org.slf4j.{Logger, LoggerFactory}
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+object MultiStoreExample {
+
+ /** This example demonstrate the usage of multiple filesystem(s3 and local) on one carbon session
+ *
+ * @param args represents "fs.s3a.access.key" "fs.s3a.secret.key" "bucket-name"
+ */
+
+ def main(args: Array[String]) {
+
+ val rootPath = new File(this.getClass.getResource("/").getPath
+ + "../../../..").getCanonicalPath
+ val storeLocation = s"$rootPath/examples/spark2/target/store"
+ val warehouse = s"$rootPath/examples/spark2/target/warehouse"
+ val metastoredb = s"$rootPath/examples/spark2/target"
+ val path = s"$rootPath/examples/spark2/src/main/resources/data.csv"
+ val logger: Logger = LoggerFactory.getLogger(this.getClass)
+
+ if (args.length != 3) {
+ logger.error("Usage: java CarbonS3Example <fs.s3a.access.key> <fs.s3a.secret" +
+ ".key> <bucket-name>")
+ System.exit(0)
+ }
+
+ CarbonProperties.getInstance()
+ .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss")
+ .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
+ .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true")
+
+ import org.apache.spark.sql.CarbonSession._
+ val spark = SparkSession
+ .builder()
+ .master("local")
+ .appName("CarbonSessionExample")
+ .config("spark.sql.warehouse.dir", warehouse)
+ .config("spark.driver.host", "localhost")
+ .config("spark.hadoop." + ACCESS_KEY, args(0))
+ .config("spark.hadoop." + SECRET_KEY, args(1))
+ .getOrCreateCarbonSession(storeLocation, warehouse)
--- End diff --

refactored to provide store location in create table command.

---

[GitHub] carbondata pull request #1584: [CARBONDATA-1827] Added S3 Implementation

In reply to this post by qiuchenjian-2

Github user SangeetaGulia commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1584#discussion_r161512456

--- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3CsvExample.scala ---
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.examples
+
+import java.io.File
+
+import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
+import org.apache.spark.sql.SparkSession
+import org.slf4j.{Logger, LoggerFactory}
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+object S3CsvExample {
+
+ /**
+ * This example demonstrate to create local store having csv on s3.
--- End diff --

Done.

---

[GitHub] carbondata pull request #1584: [CARBONDATA-1827] Added S3 Implementation

In reply to this post by qiuchenjian-2

Github user SangeetaGulia commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1584#discussion_r161513550

--- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
@@ -0,0 +1,152 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.examples
+
+import java.io.File
+
+import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
+import org.apache.spark.sql.SparkSession
+import org.slf4j.{Logger, LoggerFactory}
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+object S3Example {
+
+ /**
+ * This example demonstrate usage of s3 as a store.
+ *
+ * @param args require three parameters "Access-key" "Secret-key"
+ * "s3 bucket path"
+ */
+
+ def main(args: Array[String]) {
+ val rootPath = new File(this.getClass.getResource("/").getPath
+ + "../../../..").getCanonicalPath
+ val warehouse = s"$rootPath/examples/spark2/target/warehouse"
+ val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
+ val logger: Logger = LoggerFactory.getLogger(this.getClass)
+ CarbonProperties.getInstance()
+ .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss")
+ .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
+ .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true")
+ .addProperty(CarbonCommonConstants.DEFAULT_CARBON_MAJOR_COMPACTION_SIZE, "0.02")
+
+ import org.apache.spark.sql.CarbonSession._
+ if (args.length != 3) {
+ logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
+ "<carbon store location>")
+ System.exit(0)
+ }
+
+ val (accessKey, secretKey) = getKeyOnPrefix(args(2))
+ val spark = SparkSession
+ .builder()
+ .master("local")
+ .appName("CarbonSessionExample")
+ .config("spark.sql.warehouse.dir", warehouse)
+ .config("spark.driver.host", "localhost")
+ .config(accessKey, args(0))
+ .config(secretKey, args(1))
+ .getOrCreateCarbonSession(args(2), warehouse)
--- End diff --

Examples are updated with location provided in create table command.

---

[GitHub] carbondata pull request #1584: [CARBONDATA-1827] Added S3 Implementation

In reply to this post by qiuchenjian-2

Github user SangeetaGulia commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1584#discussion_r161513824

--- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
@@ -0,0 +1,152 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.examples
+
+import java.io.File
+
+import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
+import org.apache.spark.sql.SparkSession
+import org.slf4j.{Logger, LoggerFactory}
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+object S3Example {
+
+ /**
+ * This example demonstrate usage of s3 as a store.
+ *
+ * @param args require three parameters "Access-key" "Secret-key"
+ * "s3 bucket path"
+ */
+
+ def main(args: Array[String]) {
+ val rootPath = new File(this.getClass.getResource("/").getPath
+ + "../../../..").getCanonicalPath
+ val warehouse = s"$rootPath/examples/spark2/target/warehouse"
+ val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
+ val logger: Logger = LoggerFactory.getLogger(this.getClass)
+ CarbonProperties.getInstance()
+ .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss")
+ .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
+ .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true")
+ .addProperty(CarbonCommonConstants.DEFAULT_CARBON_MAJOR_COMPACTION_SIZE, "0.02")
+
+ import org.apache.spark.sql.CarbonSession._
+ if (args.length != 3) {
+ logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
+ "<carbon store location>")
+ System.exit(0)
+ }
+
+ val (accessKey, secretKey) = getKeyOnPrefix(args(2))
+ val spark = SparkSession
+ .builder()
+ .master("local")
+ .appName("CarbonSessionExample")
+ .config("spark.sql.warehouse.dir", warehouse)
+ .config("spark.driver.host", "localhost")
+ .config(accessKey, args(0))
+ .config(secretKey, args(1))
+ .getOrCreateCarbonSession(args(2), warehouse)
+
+ spark.sparkContext.setLogLevel("INFO")
+
+ spark.sql(
+ s"""
+ | CREATE TABLE if not exists carbon_table(
+ | shortField SHORT,
+ | intField INT,
+ | bigintField LONG,
+ | doubleField DOUBLE,
+ | stringField STRING,
+ | timestampField TIMESTAMP,
+ | decimalField DECIMAL(18,2),
+ | dateField DATE,
+ | charField CHAR(5),
+ | floatField FLOAT
+ | )
+ | STORED BY 'carbondata'
--- End diff --

We have taken entire location as a command line argument as the same example can be used for s3n as well.

---

[GitHub] carbondata pull request #1584: [CARBONDATA-1827] Added S3 Implementation

In reply to this post by qiuchenjian-2

Github user SangeetaGulia closed the pull request at:

https://github.com/apache/carbondata/pull/1584

---

12345