GitHub user xubo245 opened a pull request:
https://github.com/apache/carbondata/pull/3054 [CARBONDATA-3232] Optimize carbonData using alluxio Optimize carbonData using alluxio: 1.Add doc 2.optimize the example Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? Yes - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. optimize the example - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. No You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/carbondata CARBONDATA-3232_OptimizeSupportAlluxio Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3054.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3054 ---- commit 4ccc9fefaf590f092fa64978ebc3ce0b8533d437 Author: xubo245 <xubo29@...> Date: 2019-01-07T12:27:37Z [CARBONDATA-3232] Optimize carbonData using alluxio ---- --- |
Github user kevinjmh commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3054#discussion_r245639533 --- Diff: docs/Integration/alluxio-guide.md --- @@ -0,0 +1,44 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to you under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + + +# Presto guide +This tutorial provides a quick introduction to using Alluxio. + +## How to use Alluxio for CarbonData? +### Install and start Alluxio +Please refer to [https://www.alluxio.org/docs/1.8/en/Getting-Started.html#starting-alluxio](https://www.alluxio.org/docs/1.8/en/Getting-Started.html#starting-alluxio) +Access the Alluxio web: [http://localhost:19999/home](http://localhost:19999/home) +By command, for example: +```$xslt +./bin/alluxio fs ls / +``` +Result: +``` +drwxr-xr-x xubo staff 1 NOT_PERSISTED 01-07-2019 15:39:24:960 DIR /carbondata +-rw-r--r-- xubo staff 50686 NOT_PERSISTED 01-07-2019 11:37:48:924 100% /data.csv +``` +### Upload Alluxio jar to CarbonData +Upload the jar "/alluxio_path/client/alluxio-YOUR-VERSION-client.jar" to CarbonData --- End diff -- "Upload to CarbonData" is confusing. What we need to do is to add the alluxio client jar to classpath, right? --- |
In reply to this post by qiuchenjian-2
Github user kevinjmh commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3054#discussion_r245639998 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/AlluxioExample.scala --- @@ -26,48 +30,88 @@ import org.apache.carbondata.examples.util.ExampleUtils /** - * configure alluxio: - * 1.start alluxio - * 2.upload the jar :"/alluxio_path/core/client/target/ - * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar" - * 3.Get more detail at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html - */ + * configure alluxio: + * 1.start alluxio + * 2.upload the jar: "/alluxio_path/core/client/target/ + * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar" + * 3.Get more detail at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html + */ object AlluxioExample { - def main(args: Array[String]) { - val spark = ExampleUtils.createCarbonSession("AlluxioExample") - exampleBody(spark) - spark.close() + def main (args: Array[String]) { + val carbon = ExampleUtils.createCarbonSession("AlluxioExample", + storePath = "alluxio://localhost:19998/carbondata") + exampleBody(carbon) + carbon.close() } - def exampleBody(spark : SparkSession): Unit = { + def exampleBody (spark: SparkSession): Unit = { + val rootPath = new File(this.getClass.getResource("/").getPath + + "../../../..").getCanonicalPath spark.sparkContext.hadoopConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem") FileFactory.getConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem") // Specify date format based on raw data CarbonProperties.getInstance() .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd") - spark.sql("DROP TABLE IF EXISTS alluxio_table") + val mFsShell = new FileSystemShell() + val localFile = rootPath + "/hadoop/src/test/resources/data.csv" + val remotePath = "/carbon_alluxio.csv" + val remoteFile = "alluxio://localhost:19998/carbon_alluxio.csv" + mFsShell.run("rm", remotePath) --- End diff -- As an example, I think we should not do this operation --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3054 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2195/ --- |
In reply to this post by qiuchenjian-2
Github user kevinjmh commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3054#discussion_r245641130 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/util/ExampleUtils.scala --- @@ -30,13 +30,17 @@ object ExampleUtils { .getCanonicalPath val storeLocation: String = currentPath + "/target/store" - def createCarbonSession(appName: String, workThreadNum: Int = 1): SparkSession = { + def createCarbonSession (appName: String, workThreadNum: Int = 1, + storePath: String = null): SparkSession = { val rootPath = new File(this.getClass.getResource("/").getPath - + "../../../..").getCanonicalPath - val storeLocation = s"$rootPath/examples/spark2/target/store" + + "../../../..").getCanonicalPath + var storeLocation = s"$rootPath/examples/spark2/target/store" val warehouse = s"$rootPath/examples/spark2/target/warehouse" val metaStoreDB = s"$rootPath/examples/spark2/target" + if (storePath != null) { + storeLocation = storePath; + } --- End diff -- ```suggestion val storeLocation = if (null != storePath) { storePath } else { s"$rootPath/examples/spark2/target/store" } ``` --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3054 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10451/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3054 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2411/ --- |
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3054#discussion_r245661158 --- Diff: docs/Integration/alluxio-guide.md --- @@ -0,0 +1,44 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to you under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + + +# Presto guide +This tutorial provides a quick introduction to using Alluxio. --- End diff -- ```suggestion This tutorial provides a brief introduction to using Alluxio. ``` --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3054#discussion_r245676210 --- Diff: README.md --- @@ -68,8 +68,8 @@ CarbonData is built using Apache Maven, to [build CarbonData](https://github.com * [FAQs](https://github.com/apache/carbondata/blob/master/docs/faq.md) ## Integration -* [Hive](https://github.com/apache/carbondata/blob/master/docs/hive-guide.md) -* [Presto](https://github.com/apache/carbondata/blob/master/docs/presto-guide.md) +* [Hive](https://github.com/apache/carbondata/blob/master/docs/Integration/hive-guide.md) --- End diff -- Don't suggest creating many folders under docs. --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:
https://github.com/apache/carbondata/pull/3054 the pr title is not consistent with pr content. how about : Add example for alluxio integration --- |
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3054#discussion_r245890285 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/AlluxioExample.scala --- @@ -26,48 +30,88 @@ import org.apache.carbondata.examples.util.ExampleUtils /** - * configure alluxio: - * 1.start alluxio - * 2.upload the jar :"/alluxio_path/core/client/target/ - * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar" - * 3.Get more detail at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html - */ + * configure alluxio: + * 1.start alluxio + * 2.upload the jar: "/alluxio_path/core/client/target/ + * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar" + * 3.Get more detail at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html + */ object AlluxioExample { - def main(args: Array[String]) { - val spark = ExampleUtils.createCarbonSession("AlluxioExample") - exampleBody(spark) - spark.close() + def main (args: Array[String]) { + val carbon = ExampleUtils.createCarbonSession("AlluxioExample", + storePath = "alluxio://localhost:19998/carbondata") + exampleBody(carbon) + carbon.close() } - def exampleBody(spark : SparkSession): Unit = { + def exampleBody (spark: SparkSession): Unit = { + val rootPath = new File(this.getClass.getResource("/").getPath + + "../../../..").getCanonicalPath spark.sparkContext.hadoopConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem") FileFactory.getConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem") // Specify date format based on raw data CarbonProperties.getInstance() .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd") - spark.sql("DROP TABLE IF EXISTS alluxio_table") + val mFsShell = new FileSystemShell() + val localFile = rootPath + "/hadoop/src/test/resources/data.csv" + val remotePath = "/carbon_alluxio.csv" + val remoteFile = "alluxio://localhost:19998/carbon_alluxio.csv" + mFsShell.run("rm", remotePath) --- End diff -- I added timestamp for temp file --- |
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3054#discussion_r245890480 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/util/ExampleUtils.scala --- @@ -30,13 +30,17 @@ object ExampleUtils { .getCanonicalPath val storeLocation: String = currentPath + "/target/store" - def createCarbonSession(appName: String, workThreadNum: Int = 1): SparkSession = { + def createCarbonSession (appName: String, workThreadNum: Int = 1, + storePath: String = null): SparkSession = { val rootPath = new File(this.getClass.getResource("/").getPath - + "../../../..").getCanonicalPath - val storeLocation = s"$rootPath/examples/spark2/target/store" + + "../../../..").getCanonicalPath + var storeLocation = s"$rootPath/examples/spark2/target/store" val warehouse = s"$rootPath/examples/spark2/target/warehouse" val metaStoreDB = s"$rootPath/examples/spark2/target" + if (storePath != null) { + storeLocation = storePath; + } --- End diff -- ok --- |
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3054#discussion_r245893187 --- Diff: docs/Integration/alluxio-guide.md --- @@ -0,0 +1,44 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to you under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + + +# Presto guide +This tutorial provides a quick introduction to using Alluxio. + +## How to use Alluxio for CarbonData? +### Install and start Alluxio +Please refer to [https://www.alluxio.org/docs/1.8/en/Getting-Started.html#starting-alluxio](https://www.alluxio.org/docs/1.8/en/Getting-Started.html#starting-alluxio) +Access the Alluxio web: [http://localhost:19999/home](http://localhost:19999/home) +By command, for example: +```$xslt +./bin/alluxio fs ls / +``` +Result: +``` +drwxr-xr-x xubo staff 1 NOT_PERSISTED 01-07-2019 15:39:24:960 DIR /carbondata +-rw-r--r-- xubo staff 50686 NOT_PERSISTED 01-07-2019 11:37:48:924 100% /data.csv +``` +### Upload Alluxio jar to CarbonData +Upload the jar "/alluxio_path/client/alluxio-YOUR-VERSION-client.jar" to CarbonData --- End diff -- No need for example now, I added dependency --- |
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3054#discussion_r245893905 --- Diff: docs/Integration/alluxio-guide.md --- @@ -0,0 +1,44 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to you under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + + +# Presto guide +This tutorial provides a quick introduction to using Alluxio. --- End diff -- ok, done --- |
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3054#discussion_r245893919 --- Diff: README.md --- @@ -68,8 +68,8 @@ CarbonData is built using Apache Maven, to [build CarbonData](https://github.com * [FAQs](https://github.com/apache/carbondata/blob/master/docs/faq.md) ## Integration -* [Hive](https://github.com/apache/carbondata/blob/master/docs/hive-guide.md) -* [Presto](https://github.com/apache/carbondata/blob/master/docs/presto-guide.md) +* [Hive](https://github.com/apache/carbondata/blob/master/docs/Integration/hive-guide.md) --- End diff -- ok, done --- |
In reply to this post by qiuchenjian-2
Github user xubo245 commented on the issue:
https://github.com/apache/carbondata/pull/3054 ok, changed @chenliang613 --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3054 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2206/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3054 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2208/ --- |
In reply to this post by qiuchenjian-2
Github user kevinjmh commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3054#discussion_r245902021 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/util/ExampleUtils.scala --- @@ -30,13 +30,20 @@ object ExampleUtils { .getCanonicalPath val storeLocation: String = currentPath + "/target/store" - def createCarbonSession(appName: String, workThreadNum: Int = 1): SparkSession = { + def createCarbonSession (appName: String, workThreadNum: Int = 1, + storePath: String = null): SparkSession = { val rootPath = new File(this.getClass.getResource("/").getPath - + "../../../..").getCanonicalPath - val storeLocation = s"$rootPath/examples/spark2/target/store" + + "../../../..").getCanonicalPath + val warehouse = s"$rootPath/examples/spark2/target/warehouse" val metaStoreDB = s"$rootPath/examples/spark2/target" + val storeLocation = if (null != storePath) { + storePath; --- End diff -- no need for `;` --- |
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3054#discussion_r245905219 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/util/ExampleUtils.scala --- @@ -30,13 +30,20 @@ object ExampleUtils { .getCanonicalPath val storeLocation: String = currentPath + "/target/store" - def createCarbonSession(appName: String, workThreadNum: Int = 1): SparkSession = { + def createCarbonSession (appName: String, workThreadNum: Int = 1, + storePath: String = null): SparkSession = { val rootPath = new File(this.getClass.getResource("/").getPath - + "../../../..").getCanonicalPath - val storeLocation = s"$rootPath/examples/spark2/target/store" + + "../../../..").getCanonicalPath + val warehouse = s"$rootPath/examples/spark2/target/warehouse" val metaStoreDB = s"$rootPath/examples/spark2/target" + val storeLocation = if (null != storePath) { + storePath; --- End diff -- ok --- |
Free forum by Nabble | Edit this page |