Operation not allowed: STORED BY (from Spark Dataframe save)

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Operation not allowed: STORED BY (from Spark Dataframe save)

yannv
Hello,

I am trying to create a carbon data table from a Spark Data Frame, however I
am getting an error with the (automatic create table statement)

I run this code on spark-shell (passing the carbon data assembly jar file
for 1.4.0 as well as master branch), on Azure HDInsight cluster with spark
2.2.1.

Code used :


org.apache.spark.sql.catalyst.parser.ParseException:
Operation not allowed: STORED BY(line 5, pos 1)

== SQL ==

 CREATE TABLE IF NOT EXISTS default.carbon_df_table_test1
 (c2 STRING, number INT)
 PARTITIONED BY (c1 string)
 STORED BY 'carbondata'
-^^^

  TBLPROPERTIES ('STREAMING' = 'false')



  at
org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39)
  at
org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateFileFormat$1.apply(SparkSqlParser.scala:1194)
  at
org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateFileFormat$1.apply(SparkSqlParser.scala:1186)
  at
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
  at
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitCreateFileFormat(SparkSqlParser.scala:1185)
  at
org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1$$anonfun$31.apply(SparkSqlParser.scala:1090)
  at
org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1$$anonfun$31.apply(SparkSqlParser.scala:1090)
  at scala.Option.map(Option.scala:146)
  at
org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1.apply(SparkSqlParser.scala:1090)



I tried various constructors for the carbon object without success.

Note : I can create a Carbondata table and insert data from CSV file
successfully (but I need to write carbon data from SparkDF), but it looks
like when the save method is executed, it tries to create the (new) table
and I get this error on "Stored by"...



Regards,
Yann



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Operation not allowed: STORED BY (from Spark Dataframe save)

sraghunandan
Hi yannav,
Can you send the df api and the code you have used.
Did you refer the example in TestLoadDataFrame.scala?

Are you trying from spark session or carbonsession?

Regards
Raghu

On Tue, 14 Aug 2018, 8:44 pm yannv, <[hidden email]> wrote:

> Hello,
>
> I am trying to create a carbon data table from a Spark Data Frame, however
> I
> am getting an error with the (automatic create table statement)
>
> I run this code on spark-shell (passing the carbon data assembly jar file
> for 1.4.0 as well as master branch), on Azure HDInsight cluster with spark
> 2.2.1.
>
> Code used :
>
>
> org.apache.spark.sql.catalyst.parser.ParseException:
> Operation not allowed: STORED BY(line 5, pos 1)
>
> == SQL ==
>
>  CREATE TABLE IF NOT EXISTS default.carbon_df_table_test1
>  (c2 STRING, number INT)
>  PARTITIONED BY (c1 string)
>  STORED BY 'carbondata'
> -^^^
>
>   TBLPROPERTIES ('STREAMING' = 'false')
>
>
>
>   at
>
> org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39)
>   at
>
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateFileFormat$1.apply(SparkSqlParser.scala:1194)
>   at
>
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateFileFormat$1.apply(SparkSqlParser.scala:1186)
>   at
>
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
>   at
>
> org.apache.spark.sql.execution.SparkSqlAstBuilder.visitCreateFileFormat(SparkSqlParser.scala:1185)
>   at
>
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1$$anonfun$31.apply(SparkSqlParser.scala:1090)
>   at
>
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1$$anonfun$31.apply(SparkSqlParser.scala:1090)
>   at scala.Option.map(Option.scala:146)
>   at
>
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1.apply(SparkSqlParser.scala:1090)
>
>
>
> I tried various constructors for the carbon object without success.
>
> Note : I can create a Carbondata table and insert data from CSV file
> successfully (but I need to write carbon data from SparkDF), but it looks
> like when the save method is executed, it tries to create the (new) table
> and I get this error on "Stored by"...
>
>
>
> Regards,
> Yann
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: Operation not allowed: STORED BY (from Spark Dataframe save)

yannv
Hello,

I am trying to create a carbon data table from a Spark Data Frame, however I
am getting an error with the (automatic create table statement)

I run this code on spark-shell (passing the carbon data assembly jar file
for 1.4.0 as well as master branch), on Azure HDInsight cluster with spark
2.2.1.

Code used :


import java.io.File

import org.apache.spark.sql.{SaveMode, SparkSession}

import org.apache.carbondata.core.constants.CarbonCommonConstants
import org.apache.carbondata.core.util.CarbonProperties

import org.apache.spark.sql.CarbonSession._

import spark.implicits._

val rootPath = new
File(this.getClass.getResource("/").getPath).getCanonicalPath
val storeLocation = s"$rootPath/store"
val warehouse = s"$rootPath/warehouse"
val metastoredb = s"$rootPath/metastore"

CarbonProperties.getInstance()
  .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd
HH:mm:ss")
  .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
  .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE, "true")
  .addProperty(CarbonCommonConstants.CARBON_BADRECORDS_LOC, "")


val carbon = SparkSession
  .builder()
  .config("spark.sql.warehouse.dir", warehouse)
  .config("spark.sql.crossJoin.enabled", "true")
  .getOrCreateCarbonSession(storeLocation, metastoredb)


val df = carbon.sparkContext.parallelize(1 to 50)
  .map(x => ("c1" + x % 10, "c2", x))
  .toDF("col1", "col2", "num")

df.write
      .format("carbondata")
      .option("tableName", "carbon_table")
      .option("partitionColumns", "col1")
      .mode(SaveMode.Overwrite)
      .save()



This is the error I am getting :

org.apache.spark.sql.catalyst.parser.ParseException:
Operation not allowed: STORED BY(line 5, pos 1)

== SQL ==

 CREATE TABLE IF NOT EXISTS default.carbon_df_table_test1
 (c2 STRING, number INT)
 PARTITIONED BY (c1 string)
 STORED BY 'carbondata'
-^^^

  TBLPROPERTIES ('STREAMING' = 'false')



  at
org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39)
  at
org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateFileFormat$1.apply(SparkSqlParser.scala:1194)
  at
org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateFileFormat$1.apply(SparkSqlParser.scala:1186)
  at
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
  at
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitCreateFileFormat(SparkSqlParser.scala:1185)
  at
org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1$$anonfun$31.apply(SparkSqlParser.scala:1090)
  at
org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1$$anonfun$31.apply(SparkSqlParser.scala:1090)
  at scala.Option.map(Option.scala:146)
  at
org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1.apply(SparkSqlParser.scala:1090)



I tried various constructors for the carbon object without success.
Note : I can create a Carbondata table and insert data from CSV file
successfully, but it looks like when the save is executed it tries to create
the (new) table and I get this error on "Stored by"...

carbon.sql("create table IF NOT EXISTS carbon_test_csv(id int, name string,
scale decimal, country string, salary double) STORED BY 'carbondata'")
carbon.sql("LOAD DATA INPATH '/tmp/sample.csv' INTO TABLE
hive_carbon_test_csv")

Regards,
Yann





--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Operation not allowed: STORED BY (from Spark Dataframe save)

sraghunandan
Hi Yann,
remove import spark.implicits._
and
Use import carbon.implicits._


modified code:

import java.io.File


import org.apache.spark.sql.{SaveMode, SparkSession}


import org.apache.carbondata.core.constants.CarbonCommonConstants

import org.apache.carbondata.core.util.CarbonProperties


import org.apache.spark.sql.CarbonSession._



val rootPath = new
File(this.getClass.getResource("/").getPath).getCanonicalPath

val storeLocation = s"$rootPath/store"

val warehouse = s"$rootPath/warehouse"

val metastoredb = s"$rootPath/metastore"


CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
"yyyy/MM/dd HH:mm:ss").addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT,
"yyyy/MM/dd").addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE,
"true").addProperty(CarbonCommonConstants.CARBON_BADRECORDS_LOC, "")



val carbon = SparkSession.builder().config("spark.sql.warehouse.dir",
warehouse).config("spark.sql.crossJoin.enabled",
"true").getOrCreateCarbonSession(storeLocation)


import carbon.implicits._


val df = carbon.sparkContext.parallelize(1 to 50).map(x => ("c1" + x % 10,
"c2", x)).toDF("col1", "col2", "num")


df.write.format("carbondata").option("tableName",
"carbon_table").option("partitionColumns",
"col1").mode(SaveMode.Overwrite).save()



Regards
Raghu

On Wed, Aug 15, 2018 at 1:05 PM yannv <[hidden email]> wrote:

> Hello,
>
> I am trying to create a carbon data table from a Spark Data Frame, however
> I
> am getting an error with the (automatic create table statement)
>
> I run this code on spark-shell (passing the carbon data assembly jar file
> for 1.4.0 as well as master branch), on Azure HDInsight cluster with spark
> 2.2.1.
>
> Code used :
>
>
> import java.io.File
>
> import org.apache.spark.sql.{SaveMode, SparkSession}
>
> import org.apache.carbondata.core.constants.CarbonCommonConstants
> import org.apache.carbondata.core.util.CarbonProperties
>
> import org.apache.spark.sql.CarbonSession._
>
> import spark.implicits._
>
> val rootPath = new
> File(this.getClass.getResource("/").getPath).getCanonicalPath
> val storeLocation = s"$rootPath/store"
> val warehouse = s"$rootPath/warehouse"
> val metastoredb = s"$rootPath/metastore"
>
> CarbonProperties.getInstance()
>   .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd
> HH:mm:ss")
>   .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
>   .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE, "true")
>   .addProperty(CarbonCommonConstants.CARBON_BADRECORDS_LOC, "")
>
>
> val carbon = SparkSession
>   .builder()
>   .config("spark.sql.warehouse.dir", warehouse)
>   .config("spark.sql.crossJoin.enabled", "true")
>   .getOrCreateCarbonSession(storeLocation, metastoredb)
>
>
> val df = carbon.sparkContext.parallelize(1 to 50)
>   .map(x => ("c1" + x % 10, "c2", x))
>   .toDF("col1", "col2", "num")
>
> df.write
>       .format("carbondata")
>       .option("tableName", "carbon_table")
>       .option("partitionColumns", "col1")
>       .mode(SaveMode.Overwrite)
>       .save()
>
>
>
> This is the error I am getting :
>
> org.apache.spark.sql.catalyst.parser.ParseException:
> Operation not allowed: STORED BY(line 5, pos 1)
>
> == SQL ==
>
>  CREATE TABLE IF NOT EXISTS default.carbon_df_table_test1
>  (c2 STRING, number INT)
>  PARTITIONED BY (c1 string)
>  STORED BY 'carbondata'
> -^^^
>
>   TBLPROPERTIES ('STREAMING' = 'false')
>
>
>
>   at
>
> org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39)
>   at
>
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateFileFormat$1.apply(SparkSqlParser.scala:1194)
>   at
>
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateFileFormat$1.apply(SparkSqlParser.scala:1186)
>   at
>
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
>   at
>
> org.apache.spark.sql.execution.SparkSqlAstBuilder.visitCreateFileFormat(SparkSqlParser.scala:1185)
>   at
>
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1$$anonfun$31.apply(SparkSqlParser.scala:1090)
>   at
>
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1$$anonfun$31.apply(SparkSqlParser.scala:1090)
>   at scala.Option.map(Option.scala:146)
>   at
>
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1.apply(SparkSqlParser.scala:1090)
>
>
>
> I tried various constructors for the carbon object without success.
> Note : I can create a Carbondata table and insert data from CSV file
> successfully, but it looks like when the save is executed it tries to
> create
> the (new) table and I get this error on "Stored by"...
>
> carbon.sql("create table IF NOT EXISTS carbon_test_csv(id int, name string,
> scale decimal, country string, salary double) STORED BY 'carbondata'")
> carbon.sql("LOAD DATA INPATH '/tmp/sample.csv' INTO TABLE
> hive_carbon_test_csv")
>
> Regards,
> Yann
>
>
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: Operation not allowed: STORED BY (from Spark Dataframe save)

yannv
Hi Raghu (Sraghunandan),

Thank you for your answer (use of carbon implicit vs spark implicit).
It is now working as expected.

Yann



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/