[GitHub] carbondata pull request #1584: [CARBONDATA-1827] Added S3 implementation and...

classic Classic list List threaded Threaded
89 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1584: [CARBONDATA-1827][WIP] Added S3 Implementation

qiuchenjian-2
Github user jatin9896 commented on the issue:

    https://github.com/apache/carbondata/pull/1584
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1584: [CARBONDATA-1827][WIP] Added S3 Implementation

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1584
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2631/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1584: [CARBONDATA-1827][WIP] Added S3 Implementation

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1584
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2453/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1584: [CARBONDATA-1827][WIP] Added S3 Implementation

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1584
 
    Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1232/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1584: [CARBONDATA-1827][WIP] Added S3 Implementatio...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1584#discussion_r159119279
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1420,6 +1425,16 @@
     
       public static final String CARBON_UPDATE_SYNC_FOLDER_DEFAULT = "/tmp/carbondata";
     
    +  /**
    +   * S3 Constants
    +   */
    +
    +  public static final String S3_IMPLEMENTATION = "fs.s3a.impl";
    --- End diff --
   
    How about S3N support? I think this can be a property set by user, user can choose S3 implementation class. The default class can be S3A


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1584: [CARBONDATA-1827][WIP] Added S3 Implementatio...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1584#discussion_r159119284
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java ---
    @@ -220,7 +221,16 @@ public static boolean mkdirs(String filePath, FileType fileType) throws IOExcept
        */
       public static DataOutputStream getDataOutputStreamUsingAppend(String path, FileType fileType)
           throws IOException {
    -    return getCarbonFile(path).getDataOutputStreamUsingAppend(path, fileType);
    +    if(FileType.S3 == fileType) {
    --- End diff --
   
    add space after `if`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1584: [CARBONDATA-1827][WIP] Added S3 Implementatio...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1584#discussion_r159119305
 
    --- Diff: core/pom.xml ---
    @@ -94,6 +99,15 @@
             </exclusion>
           </exclusions>
         </dependency>
    +
    +
    --- End diff --
   
    remove empty line


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1584: [CARBONDATA-1827][WIP] Added S3 Implementatio...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1584#discussion_r159119354
 
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,100 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3Example {
    +
    +  def main(args: Array[String]) {
    +
    +
    --- End diff --
   
    remove empty line


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1584: [CARBONDATA-1827][WIP] Added S3 Implementatio...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1584#discussion_r159119359
 
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,100 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3Example {
    +
    +  def main(args: Array[String]) {
    +
    +
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val storeLocation = s"$rootPath/examples/spark2/target/store"
    +    val warehouse = s"$rootPath/examples/spark2/target/warehouse"
    +    val metastoredb = s"$rootPath/examples/spark2/target"
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data.csv"
    +
    +
    --- End diff --
   
    remove empty line


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1584: [CARBONDATA-1827][WIP] Added S3 Implementatio...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1584#discussion_r159119382
 
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,100 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3Example {
    +
    +  def main(args: Array[String]) {
    +
    +
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val storeLocation = s"$rootPath/examples/spark2/target/store"
    +    val warehouse = s"$rootPath/examples/spark2/target/warehouse"
    +    val metastoredb = s"$rootPath/examples/spark2/target"
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data.csv"
    +
    +
    +    CarbonProperties.getInstance()
    +      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss")
    +      .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
    +      .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true")
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    val spark = SparkSession
    +      .builder()
    +      .master("local")
    +      .appName("CarbonSessionExample")
    +      .config("spark.sql.warehouse.dir", warehouse)
    +      .config("spark.driver.host", "localhost")
    +      .config(ACCESS_KEY, "*****************")
    --- End diff --
   
    it is better to get all S3 related config from command line argument (`args`), and print the usage of this class, like
    ```
    Usage: java CarbonS3Example <fs.s3a.endpoint> <fs.s3a.access.key>" +
                           " <fs.s3a.secret.key> <fs.s3a.impl> <carbon store location>
    ```


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1584: [CARBONDATA-1827][WIP] Added S3 Implementatio...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1584#discussion_r159119405
 
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/FederationExample.scala ---
    @@ -0,0 +1,146 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object FederationExample {
    --- End diff --
   
    What is this example? can you add comment. Add for next example also


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1584: [CARBONDATA-1827][WIP] Added S3 Implementatio...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1584#discussion_r159119442
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java ---
    @@ -171,6 +171,7 @@ public static boolean createNewFile(
           final FsPermission permission) throws IOException {
         return getCarbonFile(filePath).createNewFile(filePath, fileType, doAs, permission);
       }
    +
    --- End diff --
   
    How about append support? please add subtasks to CARBONDATA-1827:
    1. support S3 table with dictionary
    2. support compaction on S3 table
    3. support data update/delete on S3 table
    4. support alter table add columns/drop columns on S3 table


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1584: [CARBONDATA-1827][WIP] Added S3 Implementatio...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1584#discussion_r159119613
 
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,100 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3Example {
    +
    +  def main(args: Array[String]) {
    +
    +
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val storeLocation = s"$rootPath/examples/spark2/target/store"
    +    val warehouse = s"$rootPath/examples/spark2/target/warehouse"
    +    val metastoredb = s"$rootPath/examples/spark2/target"
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data.csv"
    +
    +
    +    CarbonProperties.getInstance()
    +      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss")
    +      .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
    +      .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true")
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    val spark = SparkSession
    +      .builder()
    +      .master("local")
    +      .appName("CarbonSessionExample")
    +      .config("spark.sql.warehouse.dir", warehouse)
    +      .config("spark.driver.host", "localhost")
    +      .config(ACCESS_KEY, "*****************")
    +      .config(SECRET_KEY, "***************************")
    +      .getOrCreateCarbonSession(storeLocation, warehouse)
    +
    +    spark.sparkContext.setLogLevel("INFO")
    +    spark.sql("CREATE DATABASE if not exists s3carbondemo LOCATION 's3a://<bucketName>/s3carbondemo'")
    --- End diff --
   
    add a DROP TABLE before creating it


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1584: [CARBONDATA-1827][WIP] Added S3 Implementatio...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1584#discussion_r159119641
 
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,100 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3Example {
    +
    +  def main(args: Array[String]) {
    +
    +
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val storeLocation = s"$rootPath/examples/spark2/target/store"
    +    val warehouse = s"$rootPath/examples/spark2/target/warehouse"
    +    val metastoredb = s"$rootPath/examples/spark2/target"
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data.csv"
    +
    +
    +    CarbonProperties.getInstance()
    +      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss")
    +      .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
    +      .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true")
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    val spark = SparkSession
    +      .builder()
    +      .master("local")
    +      .appName("CarbonSessionExample")
    +      .config("spark.sql.warehouse.dir", warehouse)
    +      .config("spark.driver.host", "localhost")
    +      .config(ACCESS_KEY, "*****************")
    +      .config(SECRET_KEY, "***************************")
    +      .getOrCreateCarbonSession(storeLocation, warehouse)
    +
    +    spark.sparkContext.setLogLevel("INFO")
    +    spark.sql("CREATE DATABASE if not exists s3carbondemo LOCATION 's3a://<bucketName>/s3carbondemo'")
    +
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE if not exists s3carbondemo.carbon_table(
    +         | shortField SHORT,
    +         | intField INT,
    +         | bigintField LONG,
    +         | doubleField DOUBLE,
    +         | stringField STRING,
    +         | timestampField TIMESTAMP,
    +         | decimalField DECIMAL(18,2),
    +         | dateField DATE,
    +         | charField CHAR(5),
    +         | floatField FLOAT,
    +         | complexData ARRAY<STRING>
    +         | )
    +         | STORED BY 'carbondata'
    +         | TBLPROPERTIES('SORT_COLUMNS'='', 'DICTIONARY_INCLUDE'='dateField, charField')
    +       """.stripMargin)
    +
    +    // scalastyle:off
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    --- End diff --
   
    add another load, so that it will become two segments


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1584: [CARBONDATA-1827][WIP] Added S3 Implementation

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1584
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2640/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1584: [CARBONDATA-1827][WIP] Added S3 Implementation

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1584
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2461/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1584: [CARBONDATA-1827][WIP] Added S3 Implementation

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1584
 
    Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1237/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1584: [CARBONDATA-1827][WIP] Added S3 Implementation

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1584
 
    style check error for newly added examples


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1584: [CARBONDATA-1827][WIP] Added S3 Implementation

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1584
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1345/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1584: [CARBONDATA-1827][WIP] Added S3 Implementation

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jatin9896 commented on the issue:

    https://github.com/apache/carbondata/pull/1584
 
    retest this please


---
12345