[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Optimize carbonData using a...

classic Classic list List threaded Threaded
42 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3054
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2426/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3054
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2431/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3054
 
    Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10467/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3054
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2212/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3054#discussion_r246044066
 
    --- Diff: docs/alluxio-guide.md ---
    @@ -0,0 +1,42 @@
    +<!--
    +    Licensed to the Apache Software Foundation (ASF) under one or more
    +    contributor license agreements.  See the NOTICE file distributed with
    +    this work for additional information regarding copyright ownership.
    +    The ASF licenses this file to you under the Apache License, Version 2.0
    +    (the "License"); you may not use this file except in compliance with
    +    the License.  You may obtain a copy of the License at
    +
    +      http://www.apache.org/licenses/LICENSE-2.0
    +
    +    Unless required by applicable law or agreed to in writing, software
    +    distributed under the License is distributed on an "AS IS" BASIS,
    +    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +    See the License for the specific language governing permissions and
    +    limitations under the License.
    +-->
    +
    +
    +# Presto guide
    --- End diff --
   
    presto?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3054#discussion_r246047276
 
    --- Diff: docs/documentation.md ---
    @@ -29,15 +29,15 @@ Apache CarbonData is a new big data file format for faster interactive query usi
     
     **Quick Start:** [Run an example program](./quick-start-guide.md#installing-and-configuring-carbondata-to-run-locally-with-spark-shell) on your local machine or [study some examples](https://github.com/apache/carbondata/tree/master/examples/spark2/src/main/scala/org/apache/carbondata/examples).
     
    -**CarbonData SQL Language Reference:** CarbonData extends the Spark SQL language and adds several [DDL](./ddl-of-carbondata.md) and [DML](./dml-of-carbondata.md) statements to support operations on it.Refer to the [Reference Manual](./language-manual.md) to understand the supported features and functions.
    +**CarbonData SQL Language Reference:** CarbonData extends the Spark SQL language and adds several [DDL](./ddl-of-carbondata.md) and [DML](./dml-of-carbondata.md) statements to support operations on it. Refer to the [Reference Manual](./language-manual.md) to understand the supported features and functions.
     
     **Programming Guides:** You can read our guides about [Java APIs supported](./sdk-guide.md) or [C++ APIs supported](./csdk-guide.md) to learn how to integrate CarbonData with your applications.
     
     
     
     ## Integration
     
    -CarbonData can be integrated with popular Execution engines like [Spark](./quick-start-guide.md#spark) , [Presto](./quick-start-guide.md#presto) and [Hive](./quick-start-guide.md#hive).Refer to the [Installation and Configuration](./quick-start-guide.md#integration) section to understand all modes of Integrating CarbonData.
    +CarbonData can be integrated with popular Execution engines like [Spark](./quick-start-guide.md#spark) , [Presto](./quick-start-guide.md#presto) and [Hive](./quick-start-guide.md#hive). CarbonData also supports read and write with [Alluxio](./quick-start-guide.md#alluxio). Refer to the [Installation and Configuration](./quick-start-guide.md#integration) section to understand all modes of Integrating CarbonData.
    --- End diff --
   
    I think it's not proper to mention Alluxio after e(*Not E*)xecution engines like SparkSQL/Presto/Hive.
   
    Meanwhile we can add another paragraph and mention CarbonData can integrate with other storage engines such as HDFS, S3, OBS, Alluxio.
   
    @chenliang613 How do you think about it?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3054#discussion_r246047576
 
    --- Diff: docs/quick-start-guide.md ---
    @@ -54,7 +54,8 @@ CarbonData can be integrated with Spark,Presto and Hive Execution Engines. The b
     ### Hive
     [Installing and Configuring CarbonData on Hive](https://github.com/apache/carbondata/blob/master/docs/hive-guide.md)
     
    -
    +### Alluxio
    --- End diff --
   
    As mentioned above, we may need to adjust the location for this section.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3054#discussion_r246049322
 
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/AlluxioExample.scala ---
    @@ -28,46 +33,86 @@ import org.apache.carbondata.examples.util.ExampleUtils
     /**
      * configure alluxio:
      * 1.start alluxio
    - * 2.upload the jar :"/alluxio_path/core/client/target/
    - * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar"
    - * 3.Get more detail at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html
    + * 2.Get more detail at: https://www.alluxio.org/docs/1.8/en/compute/Spark.html
      */
    -
     object AlluxioExample {
    -  def main(args: Array[String]) {
    -    val spark = ExampleUtils.createCarbonSession("AlluxioExample")
    -    exampleBody(spark)
    -    spark.close()
    +  def main (args: Array[String]) {
    +    val carbon = ExampleUtils.createCarbonSession("AlluxioExample",
    +      storePath = "alluxio://localhost:19998/carbondata")
    +    exampleBody(carbon)
    +    carbon.close()
       }
     
    -  def exampleBody(spark : SparkSession): Unit = {
    +  def exampleBody (spark: SparkSession): Unit = {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +      + "../../../..").getCanonicalPath
         spark.sparkContext.hadoopConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem")
         FileFactory.getConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem")
     
         // Specify date format based on raw data
         CarbonProperties.getInstance()
           .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
     
    -    spark.sql("DROP TABLE IF EXISTS alluxio_table")
    +    val time = new SimpleDateFormat("yyyyMMddHHmmssSSS").format(new Date())
    +
    +    val mFsShell = new FileSystemShell()
    +    val localFile = rootPath + "/hadoop/src/test/resources/data.csv"
    +    val remotePath = "/carbon_alluxio" + time + ".csv"
    +    val remoteFile = "alluxio://localhost:19998/carbon_alluxio" + time + ".csv"
    --- End diff --
   
    use 'prefix + remotePath' instead of concating the path by hand


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3054#discussion_r246050916
 
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/AlluxioExample.scala ---
    @@ -28,46 +33,86 @@ import org.apache.carbondata.examples.util.ExampleUtils
     /**
      * configure alluxio:
      * 1.start alluxio
    - * 2.upload the jar :"/alluxio_path/core/client/target/
    - * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar"
    - * 3.Get more detail at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html
    + * 2.Get more detail at: https://www.alluxio.org/docs/1.8/en/compute/Spark.html
      */
    -
     object AlluxioExample {
    -  def main(args: Array[String]) {
    -    val spark = ExampleUtils.createCarbonSession("AlluxioExample")
    -    exampleBody(spark)
    -    spark.close()
    +  def main (args: Array[String]) {
    +    val carbon = ExampleUtils.createCarbonSession("AlluxioExample",
    +      storePath = "alluxio://localhost:19998/carbondata")
    +    exampleBody(carbon)
    +    carbon.close()
       }
     
    -  def exampleBody(spark : SparkSession): Unit = {
    +  def exampleBody (spark: SparkSession): Unit = {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +      + "../../../..").getCanonicalPath
         spark.sparkContext.hadoopConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem")
    --- End diff --
   
    Only providing an example for dataframe is not enough. Seems we should add some configurations in carbon property file and spark properties to make it work through beeline. So we can make it clear in case the user want to try it from beeline.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3054#discussion_r246249301
 
    --- Diff: docs/alluxio-guide.md ---
    @@ -0,0 +1,42 @@
    +<!--
    +    Licensed to the Apache Software Foundation (ASF) under one or more
    +    contributor license agreements.  See the NOTICE file distributed with
    +    this work for additional information regarding copyright ownership.
    +    The ASF licenses this file to you under the Apache License, Version 2.0
    +    (the "License"); you may not use this file except in compliance with
    +    the License.  You may obtain a copy of the License at
    +
    +      http://www.apache.org/licenses/LICENSE-2.0
    +
    +    Unless required by applicable law or agreed to in writing, software
    +    distributed under the License is distributed on an "AS IS" BASIS,
    +    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +    See the License for the specific language governing permissions and
    +    limitations under the License.
    +-->
    +
    +
    +# Presto guide
    --- End diff --
   
    changed


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3054#discussion_r246249496
 
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/AlluxioExample.scala ---
    @@ -28,46 +33,86 @@ import org.apache.carbondata.examples.util.ExampleUtils
     /**
      * configure alluxio:
      * 1.start alluxio
    - * 2.upload the jar :"/alluxio_path/core/client/target/
    - * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar"
    - * 3.Get more detail at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html
    + * 2.Get more detail at: https://www.alluxio.org/docs/1.8/en/compute/Spark.html
      */
    -
     object AlluxioExample {
    -  def main(args: Array[String]) {
    -    val spark = ExampleUtils.createCarbonSession("AlluxioExample")
    -    exampleBody(spark)
    -    spark.close()
    +  def main (args: Array[String]) {
    +    val carbon = ExampleUtils.createCarbonSession("AlluxioExample",
    +      storePath = "alluxio://localhost:19998/carbondata")
    +    exampleBody(carbon)
    +    carbon.close()
       }
     
    -  def exampleBody(spark : SparkSession): Unit = {
    +  def exampleBody (spark: SparkSession): Unit = {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +      + "../../../..").getCanonicalPath
         spark.sparkContext.hadoopConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem")
         FileFactory.getConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem")
     
         // Specify date format based on raw data
         CarbonProperties.getInstance()
           .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
     
    -    spark.sql("DROP TABLE IF EXISTS alluxio_table")
    +    val time = new SimpleDateFormat("yyyyMMddHHmmssSSS").format(new Date())
    +
    +    val mFsShell = new FileSystemShell()
    +    val localFile = rootPath + "/hadoop/src/test/resources/data.csv"
    +    val remotePath = "/carbon_alluxio" + time + ".csv"
    +    val remoteFile = "alluxio://localhost:19998/carbon_alluxio" + time + ".csv"
    --- End diff --
   
    ok


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3054#discussion_r246280098
 
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/AlluxioExample.scala ---
    @@ -28,46 +33,86 @@ import org.apache.carbondata.examples.util.ExampleUtils
     /**
      * configure alluxio:
      * 1.start alluxio
    - * 2.upload the jar :"/alluxio_path/core/client/target/
    - * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar"
    - * 3.Get more detail at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html
    + * 2.Get more detail at: https://www.alluxio.org/docs/1.8/en/compute/Spark.html
      */
    -
     object AlluxioExample {
    -  def main(args: Array[String]) {
    -    val spark = ExampleUtils.createCarbonSession("AlluxioExample")
    -    exampleBody(spark)
    -    spark.close()
    +  def main (args: Array[String]) {
    +    val carbon = ExampleUtils.createCarbonSession("AlluxioExample",
    +      storePath = "alluxio://localhost:19998/carbondata")
    +    exampleBody(carbon)
    +    carbon.close()
       }
     
    -  def exampleBody(spark : SparkSession): Unit = {
    +  def exampleBody (spark: SparkSession): Unit = {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +      + "../../../..").getCanonicalPath
         spark.sparkContext.hadoopConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem")
    --- End diff --
   
    now Spark-shell and spark-submit is ok, but CarbonThriftServer and beeline still have some problem.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3054
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2232/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3054
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2450/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3054
 
    Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10489/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3054
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2459/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3054
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2239/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3054
 
    Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10498/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3054#discussion_r246427391
 
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/AlluxioExample.scala ---
    @@ -28,46 +33,86 @@ import org.apache.carbondata.examples.util.ExampleUtils
     /**
      * configure alluxio:
      * 1.start alluxio
    - * 2.upload the jar :"/alluxio_path/core/client/target/
    - * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar"
    - * 3.Get more detail at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html
    + * 2.Get more detail at: https://www.alluxio.org/docs/1.8/en/compute/Spark.html
      */
    -
     object AlluxioExample {
    -  def main(args: Array[String]) {
    -    val spark = ExampleUtils.createCarbonSession("AlluxioExample")
    -    exampleBody(spark)
    -    spark.close()
    +  def main (args: Array[String]) {
    +    val carbon = ExampleUtils.createCarbonSession("AlluxioExample",
    +      storePath = "alluxio://localhost:19998/carbondata")
    +    exampleBody(carbon)
    +    carbon.close()
       }
     
    -  def exampleBody(spark : SparkSession): Unit = {
    +  def exampleBody (spark: SparkSession): Unit = {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +      + "../../../..").getCanonicalPath
         spark.sparkContext.hadoopConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem")
    --- End diff --
   
    So you need to mention this in the current document


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3054
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2246/



---
123