Apache CarbonData Dev Mailing List archive

Re: Storing Data Frame as CarbonData Table

Posted by mohdshahidkhan on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Storing-Data-Frame-as-CarbonData-Table-tp43874p44293.html

Hi Michael,
Hope below details will help you.

1. How should I configure carbon to get performance ?
Please refer below link to optimize data loading performance in Carbon.
*https://github.com/apache/carbondata/blob/master/docs/useful-tips-on-carbondata.md#configuration-for-optimizing-data-loading-performance-for-massive-data
<https://github.com/apache/carbondata/blob/master/docs/useful-tips-on-carbondata.md#configuration-for-optimizing-data-loading-performance-for-massive-data>*

2. How to configure carbon.properties?

PropertyValueDescription
spark.driver.extraJavaOptions -Dcarbon.properties.filepath =
$SPARK_HOME/conf/carbon.properties A string of extra JVM options to pass to
the driver. For instance, GC settings or other logging.
spark.executor.extraJavaOptions -Dcarbon.properties.filepath =
$SPARK_HOME/conf/carbon.properties A string of extra JVM options to pass to
executors. For instance, GC settings or other logging. *NOTE*: You can e
For more details, you can refer be below.

*https://github.com/apache/carbondata/blob/master/docs/installation-guide.md#installing-and-configuring-carbondata-on-standalone-spark-cluster
<https://github.com/apache/carbondata/blob/master/docs/installation-guide.md#installing-and-configuring-carbondata-on-standalone-spark-cluster>*

On Tue, Apr 3, 2018 at 6:24 PM, Michael Shtelma <[hidden email]> wrote:

> Hi Liang,
>
> Many thanks for your answer!
> It has worked in this way.
> I am wondering now, how should I configure carbon to get performance
> comparable with parquet.
> Now I am using default properties, actually no properties at all.
> I have tried saving one table to carbon, and it took ages comparable to
> parquet.
> Should I configure somewhere number of writer threads or smth like this ?
> I have started spark shell with local[*] option, so I have hoped, that
> the write process will use all available cores, but this was not the
> case.
> It is looking, that only one or two cores are actively used.
>
> Another question: where can I place carbon.properties ? If I place it
> to the same folder as spark-defaults.properties, will carbon
> automatically use them?
>
> Best,
> Michael
>
>
> On Mon, Apr 2, 2018 at 8:53 AM, Liang Chen <[hidden email]>
> wrote:
> > Hi Michael
> >
> > Yes, it is very easy to save any spark data to carbondata.
> > Just need to do small change based on your script, as below :
> > myDF.write
> > .format("carbondata")
> > .option("tableName" "MyTable")
> > .mode(SaveMode.Overwrite)
> > .save()
> >
> > For more detail, you can refer to examples:
> > https://github.com/apache/carbondata/blob/master/
> examples/spark2/src/main/scala/org/apache/carbondata/examples/
> CarbonDataFrameExample.scala
> >
> >
> > HTH.
> >
> > Regards
> > Liang
> >
> >
> > 2018-03-31 18:15 GMT+08:00 Michael Shtelma <[hidden email]>:
> >
> >> Hi Team,
> >>
> >> I am new to CarbonData and wanted to test it using a couple of my test
> >> queries.
> >> In my test I have used CarbonData 1.3.1 and Spark 2.2.1.
> >>
> >> I have tried saving my data frame as carbon data table using the
> >> following command :
> >>
> >> myDF.write.format("carbondata").mode("overwrite").
> saveAsTable("MyTable")
> >>
> >> As a result I have got the following exception:
> >>
> >> java.lang.IllegalArgumentException: requirement failed: 'path' should
> >> not be specified, the path to store carbon file is the 'storePath'
> >> specified when creating CarbonContext
> >>
> >> at scala.Predef$.require(Predef.scala:224)
> >>
> >> at org.apache.spark.sql.CarbonSource.createRelation(
> >> CarbonSource.scala:90)
> >>
> >> at org.apache.spark.sql.execution.datasources.
> DataSource.writeAndRead(
> >> DataSource.scala:449)
> >>
> >> at org.apache.spark.sql.execution.command.
> CreateDataSourceTableAsSelectC
> >> ommand.saveDataIntoTable(createDataSourceTables.scala:217)
> >>
> >> at org.apache.spark.sql.execution.command.
> CreateDataSourceTableAsSelectC
> >> ommand.run(createDataSourceTables.scala:177)
> >>
> >> at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> >> sideEffectResult$lzycompute(commands.scala:58)
> >>
> >> at org.apache.spark.sql.execution.command.ExecutedCommandExec.
> >> sideEffectResult(commands.scala:56)
> >>
> >> at org.apache.spark.sql.execution.command.
> ExecutedCommandExec.doExecute(
> >> commands.scala:74)
> >>
> >> at org.apache.spark.sql.execution.SparkPlan$$anonfun$
> >> execute$1.apply(SparkPlan.scala:117)
> >>
> >> at org.apache.spark.sql.execution.SparkPlan$$anonfun$
> >> execute$1.apply(SparkPlan.scala:117)
> >>
> >> at org.apache.spark.sql.execution.SparkPlan$$anonfun$
> >> executeQuery$1.apply(SparkPlan.scala:138)
> >>
> >> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> >> RDDOperationScope.scala:151)
> >>
> >> at org.apache.spark.sql.execution.SparkPlan.
> >> executeQuery(SparkPlan.scala:135)
> >>
> >> at org.apache.spark.sql.execution.SparkPlan.execute(
> SparkPlan.scala:116)
> >>
> >> at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(
> >> QueryExecution.scala:92)
> >>
> >> at org.apache.spark.sql.execution.QueryExecution.
> >> toRdd(QueryExecution.scala:92)
> >>
> >> at org.apache.spark.sql.DataFrameWriter.runCommand(
> >> DataFrameWriter.scala:609)
> >>
> >> at org.apache.spark.sql.DataFrameWriter.createTable(
> >> DataFrameWriter.scala:419)
> >>
> >> at org.apache.spark.sql.DataFrameWriter.saveAsTable(
> >> DataFrameWriter.scala:398)
> >>
> >> at org.apache.spark.sql.DataFrameWriter.saveAsTable(
> >> DataFrameWriter.scala:354)
> >>
> >> ... 54 elided
> >>
> >> I am wondering now, if there is a way to save any spark data frame as
> >> hive tables backed by carbon data format?
> >> Am I doing smth wrong?
> >>
> >> Best,
> >> Michael
> >>
> >
> >
> >
> > --
> > Regards
> > Liang
>