Hi Team,
I am new to CarbonData and wanted to test it using a couple of my test queries. In my test I have used CarbonData 1.3.1 and Spark 2.2.1. I have tried saving my data frame as carbon data table using the following command : myDF.write.format("carbondata").mode("overwrite").saveAsTable("MyTable") As a result I have got the following exception: java.lang.IllegalArgumentException: requirement failed: 'path' should not be specified, the path to store carbon file is the 'storePath' specified when creating CarbonContext at scala.Predef$.require(Predef.scala:224) at org.apache.spark.sql.CarbonSource.createRelation(CarbonSource.scala:90) at org.apache.spark.sql.execution.datasources.DataSource.writeAndRead(DataSource.scala:449) at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.saveDataIntoTable(createDataSourceTables.scala:217) at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:177) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609) at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:419) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:398) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:354) ... 54 elided I am wondering now, if there is a way to save any spark data frame as hive tables backed by carbon data format? Am I doing smth wrong? Best, Michael |
Administrator
|
Hi Michael
Yes, it is very easy to save any spark data to carbondata. Just need to do small change based on your script, as below : myDF.write .format("carbondata") .option("tableName" "MyTable") .mode(SaveMode.Overwrite) .save() For more detail, you can refer to examples: https://github.com/apache/carbondata/blob/master/examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonDataFrameExample.scala HTH. Regards Liang 2018-03-31 18:15 GMT+08:00 Michael Shtelma <[hidden email]>: > Hi Team, > > I am new to CarbonData and wanted to test it using a couple of my test > queries. > In my test I have used CarbonData 1.3.1 and Spark 2.2.1. > > I have tried saving my data frame as carbon data table using the > following command : > > myDF.write.format("carbondata").mode("overwrite").saveAsTable("MyTable") > > As a result I have got the following exception: > > java.lang.IllegalArgumentException: requirement failed: 'path' should > not be specified, the path to store carbon file is the 'storePath' > specified when creating CarbonContext > > at scala.Predef$.require(Predef.scala:224) > > at org.apache.spark.sql.CarbonSource.createRelation( > CarbonSource.scala:90) > > at org.apache.spark.sql.execution.datasources.DataSource.writeAndRead( > DataSource.scala:449) > > at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectC > ommand.saveDataIntoTable(createDataSourceTables.scala:217) > > at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectC > ommand.run(createDataSourceTables.scala:177) > > at org.apache.spark.sql.execution.command.ExecutedCommandExec. > sideEffectResult$lzycompute(commands.scala:58) > > at org.apache.spark.sql.execution.command.ExecutedCommandExec. > sideEffectResult(commands.scala:56) > > at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute( > commands.scala:74) > > at org.apache.spark.sql.execution.SparkPlan$$anonfun$ > execute$1.apply(SparkPlan.scala:117) > > at org.apache.spark.sql.execution.SparkPlan$$anonfun$ > execute$1.apply(SparkPlan.scala:117) > > at org.apache.spark.sql.execution.SparkPlan$$anonfun$ > executeQuery$1.apply(SparkPlan.scala:138) > > at org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:151) > > at org.apache.spark.sql.execution.SparkPlan. > executeQuery(SparkPlan.scala:135) > > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) > > at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute( > QueryExecution.scala:92) > > at org.apache.spark.sql.execution.QueryExecution. > toRdd(QueryExecution.scala:92) > > at org.apache.spark.sql.DataFrameWriter.runCommand( > DataFrameWriter.scala:609) > > at org.apache.spark.sql.DataFrameWriter.createTable( > DataFrameWriter.scala:419) > > at org.apache.spark.sql.DataFrameWriter.saveAsTable( > DataFrameWriter.scala:398) > > at org.apache.spark.sql.DataFrameWriter.saveAsTable( > DataFrameWriter.scala:354) > > ... 54 elided > > I am wondering now, if there is a way to save any spark data frame as > hive tables backed by carbon data format? > Am I doing smth wrong? > > Best, > Michael > -- Regards Liang |
Hi Liang,
Many thanks for your answer! It has worked in this way. I am wondering now, how should I configure carbon to get performance comparable with parquet. Now I am using default properties, actually no properties at all. I have tried saving one table to carbon, and it took ages comparable to parquet. Should I configure somewhere number of writer threads or smth like this ? I have started spark shell with local[*] option, so I have hoped, that the write process will use all available cores, but this was not the case. It is looking, that only one or two cores are actively used. Another question: where can I place carbon.properties ? If I place it to the same folder as spark-defaults.properties, will carbon automatically use them? Best, Michael On Mon, Apr 2, 2018 at 8:53 AM, Liang Chen <[hidden email]> wrote: > Hi Michael > > Yes, it is very easy to save any spark data to carbondata. > Just need to do small change based on your script, as below : > myDF.write > .format("carbondata") > .option("tableName" "MyTable") > .mode(SaveMode.Overwrite) > .save() > > For more detail, you can refer to examples: > https://github.com/apache/carbondata/blob/master/examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonDataFrameExample.scala > > > HTH. > > Regards > Liang > > > 2018-03-31 18:15 GMT+08:00 Michael Shtelma <[hidden email]>: > >> Hi Team, >> >> I am new to CarbonData and wanted to test it using a couple of my test >> queries. >> In my test I have used CarbonData 1.3.1 and Spark 2.2.1. >> >> I have tried saving my data frame as carbon data table using the >> following command : >> >> myDF.write.format("carbondata").mode("overwrite").saveAsTable("MyTable") >> >> As a result I have got the following exception: >> >> java.lang.IllegalArgumentException: requirement failed: 'path' should >> not be specified, the path to store carbon file is the 'storePath' >> specified when creating CarbonContext >> >> at scala.Predef$.require(Predef.scala:224) >> >> at org.apache.spark.sql.CarbonSource.createRelation( >> CarbonSource.scala:90) >> >> at org.apache.spark.sql.execution.datasources.DataSource.writeAndRead( >> DataSource.scala:449) >> >> at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectC >> ommand.saveDataIntoTable(createDataSourceTables.scala:217) >> >> at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectC >> ommand.run(createDataSourceTables.scala:177) >> >> at org.apache.spark.sql.execution.command.ExecutedCommandExec. >> sideEffectResult$lzycompute(commands.scala:58) >> >> at org.apache.spark.sql.execution.command.ExecutedCommandExec. >> sideEffectResult(commands.scala:56) >> >> at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute( >> commands.scala:74) >> >> at org.apache.spark.sql.execution.SparkPlan$$anonfun$ >> execute$1.apply(SparkPlan.scala:117) >> >> at org.apache.spark.sql.execution.SparkPlan$$anonfun$ >> execute$1.apply(SparkPlan.scala:117) >> >> at org.apache.spark.sql.execution.SparkPlan$$anonfun$ >> executeQuery$1.apply(SparkPlan.scala:138) >> >> at org.apache.spark.rdd.RDDOperationScope$.withScope( >> RDDOperationScope.scala:151) >> >> at org.apache.spark.sql.execution.SparkPlan. >> executeQuery(SparkPlan.scala:135) >> >> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) >> >> at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute( >> QueryExecution.scala:92) >> >> at org.apache.spark.sql.execution.QueryExecution. >> toRdd(QueryExecution.scala:92) >> >> at org.apache.spark.sql.DataFrameWriter.runCommand( >> DataFrameWriter.scala:609) >> >> at org.apache.spark.sql.DataFrameWriter.createTable( >> DataFrameWriter.scala:419) >> >> at org.apache.spark.sql.DataFrameWriter.saveAsTable( >> DataFrameWriter.scala:398) >> >> at org.apache.spark.sql.DataFrameWriter.saveAsTable( >> DataFrameWriter.scala:354) >> >> ... 54 elided >> >> I am wondering now, if there is a way to save any spark data frame as >> hive tables backed by carbon data format? >> Am I doing smth wrong? >> >> Best, >> Michael >> > > > > -- > Regards > Liang |
Hi Michael,
Hope below details will help you. 1. How should I configure carbon to get performance ? Please refer below link to optimize data loading performance in Carbon. *https://github.com/apache/carbondata/blob/master/docs/useful-tips-on-carbondata.md#configuration-for-optimizing-data-loading-performance-for-massive-data <https://github.com/apache/carbondata/blob/master/docs/useful-tips-on-carbondata.md#configuration-for-optimizing-data-loading-performance-for-massive-data>* 2. How to configure carbon.properties? PropertyValueDescription spark.driver.extraJavaOptions -Dcarbon.properties.filepath = $SPARK_HOME/conf/carbon.properties A string of extra JVM options to pass to the driver. For instance, GC settings or other logging. spark.executor.extraJavaOptions -Dcarbon.properties.filepath = $SPARK_HOME/conf/carbon.properties A string of extra JVM options to pass to executors. For instance, GC settings or other logging. *NOTE*: You can enter multiple values separated by space. On Tue, Apr 3, 2018 at 6:24 PM, Michael Shtelma <[hidden email]> wrote: > Hi Liang, > > Many thanks for your answer! > It has worked in this way. > I am wondering now, how should I configure carbon to get performance > comparable with parquet. > Now I am using default properties, actually no properties at all. > I have tried saving one table to carbon, and it took ages comparable to > parquet. > Should I configure somewhere number of writer threads or smth like this ? > I have started spark shell with local[*] option, so I have hoped, that > the write process will use all available cores, but this was not the > case. > It is looking, that only one or two cores are actively used. > > Another question: where can I place carbon.properties ? If I place it > to the same folder as spark-defaults.properties, will carbon > automatically use them? > > Best, > Michael > > > On Mon, Apr 2, 2018 at 8:53 AM, Liang Chen <[hidden email]> > wrote: > > Hi Michael > > > > Yes, it is very easy to save any spark data to carbondata. > > Just need to do small change based on your script, as below : > > myDF.write > > .format("carbondata") > > .option("tableName" "MyTable") > > .mode(SaveMode.Overwrite) > > .save() > > > > For more detail, you can refer to examples: > > https://github.com/apache/carbondata/blob/master/ > examples/spark2/src/main/scala/org/apache/carbondata/examples/ > CarbonDataFrameExample.scala > > > > > > HTH. > > > > Regards > > Liang > > > > > > 2018-03-31 18:15 GMT+08:00 Michael Shtelma <[hidden email]>: > > > >> Hi Team, > >> > >> I am new to CarbonData and wanted to test it using a couple of my test > >> queries. > >> In my test I have used CarbonData 1.3.1 and Spark 2.2.1. > >> > >> I have tried saving my data frame as carbon data table using the > >> following command : > >> > >> myDF.write.format("carbondata").mode("overwrite"). > saveAsTable("MyTable") > >> > >> As a result I have got the following exception: > >> > >> java.lang.IllegalArgumentException: requirement failed: 'path' should > >> not be specified, the path to store carbon file is the 'storePath' > >> specified when creating CarbonContext > >> > >> at scala.Predef$.require(Predef.scala:224) > >> > >> at org.apache.spark.sql.CarbonSource.createRelation( > >> CarbonSource.scala:90) > >> > >> at org.apache.spark.sql.execution.datasources. > DataSource.writeAndRead( > >> DataSource.scala:449) > >> > >> at org.apache.spark.sql.execution.command. > CreateDataSourceTableAsSelectC > >> ommand.saveDataIntoTable(createDataSourceTables.scala:217) > >> > >> at org.apache.spark.sql.execution.command. > CreateDataSourceTableAsSelectC > >> ommand.run(createDataSourceTables.scala:177) > >> > >> at org.apache.spark.sql.execution.command.ExecutedCommandExec. > >> sideEffectResult$lzycompute(commands.scala:58) > >> > >> at org.apache.spark.sql.execution.command.ExecutedCommandExec. > >> sideEffectResult(commands.scala:56) > >> > >> at org.apache.spark.sql.execution.command. > ExecutedCommandExec.doExecute( > >> commands.scala:74) > >> > >> at org.apache.spark.sql.execution.SparkPlan$$anonfun$ > >> execute$1.apply(SparkPlan.scala:117) > >> > >> at org.apache.spark.sql.execution.SparkPlan$$anonfun$ > >> execute$1.apply(SparkPlan.scala:117) > >> > >> at org.apache.spark.sql.execution.SparkPlan$$anonfun$ > >> executeQuery$1.apply(SparkPlan.scala:138) > >> > >> at org.apache.spark.rdd.RDDOperationScope$.withScope( > >> RDDOperationScope.scala:151) > >> > >> at org.apache.spark.sql.execution.SparkPlan. > >> executeQuery(SparkPlan.scala:135) > >> > >> at org.apache.spark.sql.execution.SparkPlan.execute( > SparkPlan.scala:116) > >> > >> at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute( > >> QueryExecution.scala:92) > >> > >> at org.apache.spark.sql.execution.QueryExecution. > >> toRdd(QueryExecution.scala:92) > >> > >> at org.apache.spark.sql.DataFrameWriter.runCommand( > >> DataFrameWriter.scala:609) > >> > >> at org.apache.spark.sql.DataFrameWriter.createTable( > >> DataFrameWriter.scala:419) > >> > >> at org.apache.spark.sql.DataFrameWriter.saveAsTable( > >> DataFrameWriter.scala:398) > >> > >> at org.apache.spark.sql.DataFrameWriter.saveAsTable( > >> DataFrameWriter.scala:354) > >> > >> ... 54 elided > >> > >> I am wondering now, if there is a way to save any spark data frame as > >> hive tables backed by carbon data format? > >> Am I doing smth wrong? > >> > >> Best, > >> Michael > >> > > > > > > > > -- > > Regards > > Liang > |
In reply to this post by Michael Shtelma
Hi Michael,
Hope below details will help you. 1. How should I configure carbon to get performance ? Please refer below link to optimize data loading performance in Carbon. *https://github.com/apache/carbondata/blob/master/docs/useful-tips-on-carbondata.md#configuration-for-optimizing-data-loading-performance-for-massive-data <https://github.com/apache/carbondata/blob/master/docs/useful-tips-on-carbondata.md#configuration-for-optimizing-data-loading-performance-for-massive-data>* 2. How to configure carbon.properties? PropertyValueDescription spark.driver.extraJavaOptions -Dcarbon.properties.filepath = $SPARK_HOME/conf/carbon.properties A string of extra JVM options to pass to the driver. For instance, GC settings or other logging. spark.executor.extraJavaOptions -Dcarbon.properties.filepath = $SPARK_HOME/conf/carbon.properties A string of extra JVM options to pass to executors. For instance, GC settings or other logging. *NOTE*: You can e For more details, you can refer be below. *https://github.com/apache/carbondata/blob/master/docs/installation-guide.md#installing-and-configuring-carbondata-on-standalone-spark-cluster <https://github.com/apache/carbondata/blob/master/docs/installation-guide.md#installing-and-configuring-carbondata-on-standalone-spark-cluster>* On Tue, Apr 3, 2018 at 6:24 PM, Michael Shtelma <[hidden email]> wrote: > Hi Liang, > > Many thanks for your answer! > It has worked in this way. > I am wondering now, how should I configure carbon to get performance > comparable with parquet. > Now I am using default properties, actually no properties at all. > I have tried saving one table to carbon, and it took ages comparable to > parquet. > Should I configure somewhere number of writer threads or smth like this ? > I have started spark shell with local[*] option, so I have hoped, that > the write process will use all available cores, but this was not the > case. > It is looking, that only one or two cores are actively used. > > Another question: where can I place carbon.properties ? If I place it > to the same folder as spark-defaults.properties, will carbon > automatically use them? > > Best, > Michael > > > On Mon, Apr 2, 2018 at 8:53 AM, Liang Chen <[hidden email]> > wrote: > > Hi Michael > > > > Yes, it is very easy to save any spark data to carbondata. > > Just need to do small change based on your script, as below : > > myDF.write > > .format("carbondata") > > .option("tableName" "MyTable") > > .mode(SaveMode.Overwrite) > > .save() > > > > For more detail, you can refer to examples: > > https://github.com/apache/carbondata/blob/master/ > examples/spark2/src/main/scala/org/apache/carbondata/examples/ > CarbonDataFrameExample.scala > > > > > > HTH. > > > > Regards > > Liang > > > > > > 2018-03-31 18:15 GMT+08:00 Michael Shtelma <[hidden email]>: > > > >> Hi Team, > >> > >> I am new to CarbonData and wanted to test it using a couple of my test > >> queries. > >> In my test I have used CarbonData 1.3.1 and Spark 2.2.1. > >> > >> I have tried saving my data frame as carbon data table using the > >> following command : > >> > >> myDF.write.format("carbondata").mode("overwrite"). > saveAsTable("MyTable") > >> > >> As a result I have got the following exception: > >> > >> java.lang.IllegalArgumentException: requirement failed: 'path' should > >> not be specified, the path to store carbon file is the 'storePath' > >> specified when creating CarbonContext > >> > >> at scala.Predef$.require(Predef.scala:224) > >> > >> at org.apache.spark.sql.CarbonSource.createRelation( > >> CarbonSource.scala:90) > >> > >> at org.apache.spark.sql.execution.datasources. > DataSource.writeAndRead( > >> DataSource.scala:449) > >> > >> at org.apache.spark.sql.execution.command. > CreateDataSourceTableAsSelectC > >> ommand.saveDataIntoTable(createDataSourceTables.scala:217) > >> > >> at org.apache.spark.sql.execution.command. > CreateDataSourceTableAsSelectC > >> ommand.run(createDataSourceTables.scala:177) > >> > >> at org.apache.spark.sql.execution.command.ExecutedCommandExec. > >> sideEffectResult$lzycompute(commands.scala:58) > >> > >> at org.apache.spark.sql.execution.command.ExecutedCommandExec. > >> sideEffectResult(commands.scala:56) > >> > >> at org.apache.spark.sql.execution.command. > ExecutedCommandExec.doExecute( > >> commands.scala:74) > >> > >> at org.apache.spark.sql.execution.SparkPlan$$anonfun$ > >> execute$1.apply(SparkPlan.scala:117) > >> > >> at org.apache.spark.sql.execution.SparkPlan$$anonfun$ > >> execute$1.apply(SparkPlan.scala:117) > >> > >> at org.apache.spark.sql.execution.SparkPlan$$anonfun$ > >> executeQuery$1.apply(SparkPlan.scala:138) > >> > >> at org.apache.spark.rdd.RDDOperationScope$.withScope( > >> RDDOperationScope.scala:151) > >> > >> at org.apache.spark.sql.execution.SparkPlan. > >> executeQuery(SparkPlan.scala:135) > >> > >> at org.apache.spark.sql.execution.SparkPlan.execute( > SparkPlan.scala:116) > >> > >> at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute( > >> QueryExecution.scala:92) > >> > >> at org.apache.spark.sql.execution.QueryExecution. > >> toRdd(QueryExecution.scala:92) > >> > >> at org.apache.spark.sql.DataFrameWriter.runCommand( > >> DataFrameWriter.scala:609) > >> > >> at org.apache.spark.sql.DataFrameWriter.createTable( > >> DataFrameWriter.scala:419) > >> > >> at org.apache.spark.sql.DataFrameWriter.saveAsTable( > >> DataFrameWriter.scala:398) > >> > >> at org.apache.spark.sql.DataFrameWriter.saveAsTable( > >> DataFrameWriter.scala:354) > >> > >> ... 54 elided > >> > >> I am wondering now, if there is a way to save any spark data frame as > >> hive tables backed by carbon data format? > >> Am I doing smth wrong? > >> > >> Best, > >> Michael > >> > > > > > > > > -- > > Regards > > Liang > |
Free forum by Nabble | Edit this page |