[GitHub] carbondata pull request #1584: [CARBONDATA-1827] Added S3 implementation and...

classic Classic list List threaded Threaded
89 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1584: [CARBONDATA-1827] Added S3 Implementation

qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1584#discussion_r161362393
 
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,152 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of s3 as a store.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 bucket path"
    +   */
    +
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val warehouse = s"$rootPath/examples/spark2/target/warehouse"
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +    CarbonProperties.getInstance()
    +      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss")
    --- End diff --
   
    Please do the same for other 3 examples also


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1584: [CARBONDATA-1827] Added S3 Implementation

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1584#discussion_r161362429
 
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/MultiStoreExample.scala ---
    @@ -0,0 +1,153 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object MultiStoreExample {
    +
    +  /** This example demonstrate the usage of multiple filesystem(s3 and local) on one carbon session
    +   *
    +   * @param args represents "fs.s3a.access.key" "fs.s3a.secret.key" "bucket-name"
    +   */
    +
    +  def main(args: Array[String]) {
    +
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val storeLocation = s"$rootPath/examples/spark2/target/store"
    +    val warehouse = s"$rootPath/examples/spark2/target/warehouse"
    +    val metastoredb = s"$rootPath/examples/spark2/target"
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    if (args.length != 3) {
    +      logger.error("Usage: java CarbonS3Example <fs.s3a.access.key> <fs.s3a.secret" +
    +              ".key> <bucket-name>")
    +      System.exit(0)
    +    }
    +
    +    CarbonProperties.getInstance()
    +      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss")
    +      .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
    +      .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true")
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    val spark = SparkSession
    +      .builder()
    +      .master("local")
    +      .appName("CarbonSessionExample")
    +      .config("spark.sql.warehouse.dir", warehouse)
    +      .config("spark.driver.host", "localhost")
    +      .config("spark.hadoop." + ACCESS_KEY, args(0))
    +      .config("spark.hadoop." + SECRET_KEY, args(1))
    +      .getOrCreateCarbonSession(storeLocation, warehouse)
    --- End diff --
   
    do not give store location, use `CREATE TABLE ... LOCATION ...` to determine the location for each table


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1584: [CARBONDATA-1827] Added S3 Implementation

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1584
 
    Please change the merge target branch as carbonstore, since master currently has been freezed for 1.3.0 version release.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1584: [CARBONDATA-1827] Added S3 Implementation

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user SangeetaGulia commented on the issue:

    https://github.com/apache/carbondata/pull/1584
 
    @jackylk we have raised a new PR with all the review comments resolved [#1805](https://github.com/apache/carbondata/pull/1805). That PR is raised to be merged into the carbonstore branch.
   
    Please review the same.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1584: [CARBONDATA-1827] Added S3 Implementation

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user SangeetaGulia commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1584#discussion_r161512369
 
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/MultiStoreExample.scala ---
    @@ -0,0 +1,153 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object MultiStoreExample {
    +
    +  /** This example demonstrate the usage of multiple filesystem(s3 and local) on one carbon session
    +   *
    +   * @param args represents "fs.s3a.access.key" "fs.s3a.secret.key" "bucket-name"
    +   */
    +
    +  def main(args: Array[String]) {
    +
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val storeLocation = s"$rootPath/examples/spark2/target/store"
    +    val warehouse = s"$rootPath/examples/spark2/target/warehouse"
    +    val metastoredb = s"$rootPath/examples/spark2/target"
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    if (args.length != 3) {
    +      logger.error("Usage: java CarbonS3Example <fs.s3a.access.key> <fs.s3a.secret" +
    +              ".key> <bucket-name>")
    +      System.exit(0)
    +    }
    +
    +    CarbonProperties.getInstance()
    +      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss")
    +      .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
    +      .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true")
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    val spark = SparkSession
    +      .builder()
    +      .master("local")
    +      .appName("CarbonSessionExample")
    +      .config("spark.sql.warehouse.dir", warehouse)
    +      .config("spark.driver.host", "localhost")
    +      .config("spark.hadoop." + ACCESS_KEY, args(0))
    +      .config("spark.hadoop." + SECRET_KEY, args(1))
    +      .getOrCreateCarbonSession(storeLocation, warehouse)
    --- End diff --
   
    refactored to provide store location in create table command.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1584: [CARBONDATA-1827] Added S3 Implementation

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user SangeetaGulia commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1584#discussion_r161512456
 
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3CsvExample.scala ---
    @@ -0,0 +1,113 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3CsvExample {
    +
    +  /**
    +   * This example demonstrate to create local store having csv on s3.
    --- End diff --
   
    Done.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1584: [CARBONDATA-1827] Added S3 Implementation

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user SangeetaGulia commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1584#discussion_r161513550
 
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,152 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of s3 as a store.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 bucket path"
    +   */
    +
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val warehouse = s"$rootPath/examples/spark2/target/warehouse"
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +    CarbonProperties.getInstance()
    +      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss")
    +      .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
    +      .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true")
    +      .addProperty(CarbonCommonConstants.DEFAULT_CARBON_MAJOR_COMPACTION_SIZE, "0.02")
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length != 3) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<carbon store location>")
    +      System.exit(0)
    +    }
    +
    +    val (accessKey, secretKey) = getKeyOnPrefix(args(2))
    +    val spark = SparkSession
    +      .builder()
    +      .master("local")
    +      .appName("CarbonSessionExample")
    +      .config("spark.sql.warehouse.dir", warehouse)
    +      .config("spark.driver.host", "localhost")
    +      .config(accessKey, args(0))
    +      .config(secretKey, args(1))
    +      .getOrCreateCarbonSession(args(2), warehouse)
    --- End diff --
   
    Examples are updated with location provided in create table command.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1584: [CARBONDATA-1827] Added S3 Implementation

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user SangeetaGulia commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1584#discussion_r161513824
 
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,152 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of s3 as a store.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 bucket path"
    +   */
    +
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val warehouse = s"$rootPath/examples/spark2/target/warehouse"
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +    CarbonProperties.getInstance()
    +      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss")
    +      .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
    +      .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true")
    +      .addProperty(CarbonCommonConstants.DEFAULT_CARBON_MAJOR_COMPACTION_SIZE, "0.02")
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length != 3) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<carbon store location>")
    +      System.exit(0)
    +    }
    +
    +    val (accessKey, secretKey) = getKeyOnPrefix(args(2))
    +    val spark = SparkSession
    +      .builder()
    +      .master("local")
    +      .appName("CarbonSessionExample")
    +      .config("spark.sql.warehouse.dir", warehouse)
    +      .config("spark.driver.host", "localhost")
    +      .config(accessKey, args(0))
    +      .config(secretKey, args(1))
    +      .getOrCreateCarbonSession(args(2), warehouse)
    +
    +    spark.sparkContext.setLogLevel("INFO")
    +
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE if not exists carbon_table(
    +         | shortField SHORT,
    +         | intField INT,
    +         | bigintField LONG,
    +         | doubleField DOUBLE,
    +         | stringField STRING,
    +         | timestampField TIMESTAMP,
    +         | decimalField DECIMAL(18,2),
    +         | dateField DATE,
    +         | charField CHAR(5),
    +         | floatField FLOAT
    +         | )
    +         | STORED BY 'carbondata'
    --- End diff --
   
    We have taken entire location as a command line argument as the same example can be used for s3n as well.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1584: [CARBONDATA-1827] Added S3 Implementation

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user SangeetaGulia closed the pull request at:

    https://github.com/apache/carbondata/pull/1584


---
12345