Apache CarbonData Dev Mailing List archive - Re: Operation not allowed: STORED BY (from Spark Dataframe save)

Apache CarbonData Dev Mailing List archive

Re: Operation not allowed: STORED BY (from Spark Dataframe save)

Posted by sraghunandan on Aug 16, 2018; 6:45am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Operation-not-allowed-STORED-BY-from-Spark-Dataframe-save-tp59584p59696.html

Hi Yann,
remove import spark.implicits._
and
Use import carbon.implicits._

modified code:

import java.io.File

import org.apache.spark.sql.{SaveMode, SparkSession}

import org.apache.carbondata.core.constants.CarbonCommonConstants

import org.apache.carbondata.core.util.CarbonProperties

import org.apache.spark.sql.CarbonSession._

val rootPath = new
File(this.getClass.getResource("/").getPath).getCanonicalPath

val storeLocation = s"$rootPath/store"

val warehouse = s"$rootPath/warehouse"

val metastoredb = s"$rootPath/metastore"

CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
"yyyy/MM/dd HH:mm:ss").addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT,
"yyyy/MM/dd").addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE,
"true").addProperty(CarbonCommonConstants.CARBON_BADRECORDS_LOC, "")

val carbon = SparkSession.builder().config("spark.sql.warehouse.dir",
warehouse).config("spark.sql.crossJoin.enabled",
"true").getOrCreateCarbonSession(storeLocation)

import carbon.implicits._

val df = carbon.sparkContext.parallelize(1 to 50).map(x => ("c1" + x % 10,
"c2", x)).toDF("col1", "col2", "num")

df.write.format("carbondata").option("tableName",
"carbon_table").option("partitionColumns",
"col1").mode(SaveMode.Overwrite).save()

Regards
Raghu

On Wed, Aug 15, 2018 at 1:05 PM yannv <[hidden email]> wrote:

> Hello,
>
> I am trying to create a carbon data table from a Spark Data Frame, however
> I
> am getting an error with the (automatic create table statement)
>
> I run this code on spark-shell (passing the carbon data assembly jar file
> for 1.4.0 as well as master branch), on Azure HDInsight cluster with spark
> 2.2.1.
>
> Code used :
>
>
> import java.io.File
>
> import org.apache.spark.sql.{SaveMode, SparkSession}
>
> import org.apache.carbondata.core.constants.CarbonCommonConstants
> import org.apache.carbondata.core.util.CarbonProperties
>
> import org.apache.spark.sql.CarbonSession._
>
> import spark.implicits._
>
> val rootPath = new
> File(this.getClass.getResource("/").getPath).getCanonicalPath
> val storeLocation = s"$rootPath/store"
> val warehouse = s"$rootPath/warehouse"
> val metastoredb = s"$rootPath/metastore"
>
> CarbonProperties.getInstance()
> .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd
> HH:mm:ss")
> .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
> .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE, "true")
> .addProperty(CarbonCommonConstants.CARBON_BADRECORDS_LOC, "")
>
>
> val carbon = SparkSession
> .builder()
> .config("spark.sql.warehouse.dir", warehouse)
> .config("spark.sql.crossJoin.enabled", "true")
> .getOrCreateCarbonSession(storeLocation, metastoredb)
>
>
> val df = carbon.sparkContext.parallelize(1 to 50)
> .map(x => ("c1" + x % 10, "c2", x))
> .toDF("col1", "col2", "num")
>
> df.write
> .format("carbondata")
> .option("tableName", "carbon_table")
> .option("partitionColumns", "col1")
> .mode(SaveMode.Overwrite)
> .save()
>
>
>
> This is the error I am getting :
>
> org.apache.spark.sql.catalyst.parser.ParseException:
> Operation not allowed: STORED BY(line 5, pos 1)
>
> == SQL ==
>
> CREATE TABLE IF NOT EXISTS default.carbon_df_table_test1
> (c2 STRING, number INT)
> PARTITIONED BY (c1 string)
> STORED BY 'carbondata'
> -^^^
>
> TBLPROPERTIES ('STREAMING' = 'false')
>
>
>
> at
>
> org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39)
> at
>
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateFileFormat$1.apply(SparkSqlParser.scala:1194)
> at
>
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateFileFormat$1.apply(SparkSqlParser.scala:1186)
> at
>
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
> at
>
> org.apache.spark.sql.execution.SparkSqlAstBuilder.visitCreateFileFormat(SparkSqlParser.scala:1185)
> at
>
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1$$anonfun$31.apply(SparkSqlParser.scala:1090)
> at
>
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1$$anonfun$31.apply(SparkSqlParser.scala:1090)
> at scala.Option.map(Option.scala:146)
> at
>
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1.apply(SparkSqlParser.scala:1090)
>
>
>
> I tried various constructors for the carbon object without success.
> Note : I can create a Carbondata table and insert data from CSV file
> successfully, but it looks like when the save is executed it tries to
> create
> the (new) table and I get this error on "Stored by"...
>
> carbon.sql("create table IF NOT EXISTS carbon_test_csv(id int, name string,
> scale decimal, country string, salary double) STORED BY 'carbondata'")
> carbon.sql("LOAD DATA INPATH '/tmp/sample.csv' INTO TABLE
> hive_carbon_test_csv")
>
> Regards,
> Yann
>
>
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>