Apache CarbonData Dev Mailing List archive

Re: Storing Data Frame as CarbonData Table

Posted by Michael Shtelma on Apr 03, 2018; 12:54pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Storing-Data-Frame-as-CarbonData-Table-tp43874p44268.html

Hi Liang,

Many thanks for your answer!
It has worked in this way.
I am wondering now, how should I configure carbon to get performance
comparable with parquet.
Now I am using default properties, actually no properties at all.
I have tried saving one table to carbon, and it took ages comparable to parquet.
Should I configure somewhere number of writer threads or smth like this ?
I have started spark shell with local[*] option, so I have hoped, that
the write process will use all available cores, but this was not the
case.
It is looking, that only one or two cores are actively used.

Another question: where can I place carbon.properties ? If I place it
to the same folder as spark-defaults.properties, will carbon
automatically use them?

Best,
Michael

On Mon, Apr 2, 2018 at 8:53 AM, Liang Chen <[hidden email]> wrote:

> Hi Michael
>
> Yes, it is very easy to save any spark data to carbondata.
> Just need to do small change based on your script, as below :
> myDF.write
> .format("carbondata")
> .option("tableName" "MyTable")
> .mode(SaveMode.Overwrite)
> .save()
>
> For more detail, you can refer to examples:
> https://github.com/apache/carbondata/blob/master/examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonDataFrameExample.scala
>
>
> HTH.
>
> Regards
> Liang
>
>
> 2018-03-31 18:15 GMT+08:00 Michael Shtelma <[hidden email]>:
>
>> Hi Team,
>>
>> I am new to CarbonData and wanted to test it using a couple of my test
>> queries.
>> In my test I have used CarbonData 1.3.1 and Spark 2.2.1.
>>
>> I have tried saving my data frame as carbon data table using the
>> following command :
>>
>> myDF.write.format("carbondata").mode("overwrite").saveAsTable("MyTable")
>>
>> As a result I have got the following exception:
>>
>> java.lang.IllegalArgumentException: requirement failed: 'path' should
>> not be specified, the path to store carbon file is the 'storePath'
>> specified when creating CarbonContext
>>
>> at scala.Predef$.require(Predef.scala:224)
>>
>> at org.apache.spark.sql.CarbonSource.createRelation(
>> CarbonSource.scala:90)
>>
>> at org.apache.spark.sql.execution.datasources.DataSource.writeAndRead(
>> DataSource.scala:449)
>>
>> at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectC
>> ommand.saveDataIntoTable(createDataSourceTables.scala:217)
>>
>> at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectC
>> ommand.run(createDataSourceTables.scala:177)
>>
>> at org.apache.spark.sql.execution.command.ExecutedCommandExec.
>> sideEffectResult$lzycompute(commands.scala:58)
>>
>> at org.apache.spark.sql.execution.command.ExecutedCommandExec.
>> sideEffectResult(commands.scala:56)
>>
>> at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(
>> commands.scala:74)
>>
>> at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> execute$1.apply(SparkPlan.scala:117)
>>
>> at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> execute$1.apply(SparkPlan.scala:117)
>>
>> at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> executeQuery$1.apply(SparkPlan.scala:138)
>>
>> at org.apache.spark.rdd.RDDOperationScope$.withScope(
>> RDDOperationScope.scala:151)
>>
>> at org.apache.spark.sql.execution.SparkPlan.
>> executeQuery(SparkPlan.scala:135)
>>
>> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
>>
>> at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(
>> QueryExecution.scala:92)
>>
>> at org.apache.spark.sql.execution.QueryExecution.
>> toRdd(QueryExecution.scala:92)
>>
>> at org.apache.spark.sql.DataFrameWriter.runCommand(
>> DataFrameWriter.scala:609)
>>
>> at org.apache.spark.sql.DataFrameWriter.createTable(
>> DataFrameWriter.scala:419)
>>
>> at org.apache.spark.sql.DataFrameWriter.saveAsTable(
>> DataFrameWriter.scala:398)
>>
>> at org.apache.spark.sql.DataFrameWriter.saveAsTable(
>> DataFrameWriter.scala:354)
>>
>> ... 54 elided
>>
>> I am wondering now, if there is a way to save any spark data frame as
>> hive tables backed by carbon data format?
>> Am I doing smth wrong?
>>
>> Best,
>> Michael
>>
>
>
>
> --
> Regards
> Liang