Hello,
I am trying to create a carbon data table from a Spark Data Frame, however I am getting an error with the (automatic create table statement) I run this code on spark-shell (passing the carbon data assembly jar file for 1.4.0 as well as master branch), on Azure HDInsight cluster with spark 2.2.1. Code used : org.apache.spark.sql.catalyst.parser.ParseException: Operation not allowed: STORED BY(line 5, pos 1) == SQL == CREATE TABLE IF NOT EXISTS default.carbon_df_table_test1 (c2 STRING, number INT) PARTITIONED BY (c1 string) STORED BY 'carbondata' -^^^ TBLPROPERTIES ('STREAMING' = 'false') at org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39) at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateFileFormat$1.apply(SparkSqlParser.scala:1194) at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateFileFormat$1.apply(SparkSqlParser.scala:1186) at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99) at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitCreateFileFormat(SparkSqlParser.scala:1185) at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1$$anonfun$31.apply(SparkSqlParser.scala:1090) at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1$$anonfun$31.apply(SparkSqlParser.scala:1090) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1.apply(SparkSqlParser.scala:1090) I tried various constructors for the carbon object without success. Note : I can create a Carbondata table and insert data from CSV file successfully (but I need to write carbon data from SparkDF), but it looks like when the save method is executed, it tries to create the (new) table and I get this error on "Stored by"... Regards, Yann -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi yannav,
Can you send the df api and the code you have used. Did you refer the example in TestLoadDataFrame.scala? Are you trying from spark session or carbonsession? Regards Raghu On Tue, 14 Aug 2018, 8:44 pm yannv, <[hidden email]> wrote: > Hello, > > I am trying to create a carbon data table from a Spark Data Frame, however > I > am getting an error with the (automatic create table statement) > > I run this code on spark-shell (passing the carbon data assembly jar file > for 1.4.0 as well as master branch), on Azure HDInsight cluster with spark > 2.2.1. > > Code used : > > > org.apache.spark.sql.catalyst.parser.ParseException: > Operation not allowed: STORED BY(line 5, pos 1) > > == SQL == > > CREATE TABLE IF NOT EXISTS default.carbon_df_table_test1 > (c2 STRING, number INT) > PARTITIONED BY (c1 string) > STORED BY 'carbondata' > -^^^ > > TBLPROPERTIES ('STREAMING' = 'false') > > > > at > > org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39) > at > > org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateFileFormat$1.apply(SparkSqlParser.scala:1194) > at > > org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateFileFormat$1.apply(SparkSqlParser.scala:1186) > at > > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99) > at > > org.apache.spark.sql.execution.SparkSqlAstBuilder.visitCreateFileFormat(SparkSqlParser.scala:1185) > at > > org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1$$anonfun$31.apply(SparkSqlParser.scala:1090) > at > > org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1$$anonfun$31.apply(SparkSqlParser.scala:1090) > at scala.Option.map(Option.scala:146) > at > > org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1.apply(SparkSqlParser.scala:1090) > > > > I tried various constructors for the carbon object without success. > > Note : I can create a Carbondata table and insert data from CSV file > successfully (but I need to write carbon data from SparkDF), but it looks > like when the save method is executed, it tries to create the (new) table > and I get this error on "Stored by"... > > > > Regards, > Yann > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Hello,
I am trying to create a carbon data table from a Spark Data Frame, however I am getting an error with the (automatic create table statement) I run this code on spark-shell (passing the carbon data assembly jar file for 1.4.0 as well as master branch), on Azure HDInsight cluster with spark 2.2.1. Code used : import java.io.File import org.apache.spark.sql.{SaveMode, SparkSession} import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.util.CarbonProperties import org.apache.spark.sql.CarbonSession._ import spark.implicits._ val rootPath = new File(this.getClass.getResource("/").getPath).getCanonicalPath val storeLocation = s"$rootPath/store" val warehouse = s"$rootPath/warehouse" val metastoredb = s"$rootPath/metastore" CarbonProperties.getInstance() .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss") .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd") .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE, "true") .addProperty(CarbonCommonConstants.CARBON_BADRECORDS_LOC, "") val carbon = SparkSession .builder() .config("spark.sql.warehouse.dir", warehouse) .config("spark.sql.crossJoin.enabled", "true") .getOrCreateCarbonSession(storeLocation, metastoredb) val df = carbon.sparkContext.parallelize(1 to 50) .map(x => ("c1" + x % 10, "c2", x)) .toDF("col1", "col2", "num") df.write .format("carbondata") .option("tableName", "carbon_table") .option("partitionColumns", "col1") .mode(SaveMode.Overwrite) .save() This is the error I am getting : org.apache.spark.sql.catalyst.parser.ParseException: Operation not allowed: STORED BY(line 5, pos 1) == SQL == CREATE TABLE IF NOT EXISTS default.carbon_df_table_test1 (c2 STRING, number INT) PARTITIONED BY (c1 string) STORED BY 'carbondata' -^^^ TBLPROPERTIES ('STREAMING' = 'false') at org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39) at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateFileFormat$1.apply(SparkSqlParser.scala:1194) at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateFileFormat$1.apply(SparkSqlParser.scala:1186) at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99) at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitCreateFileFormat(SparkSqlParser.scala:1185) at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1$$anonfun$31.apply(SparkSqlParser.scala:1090) at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1$$anonfun$31.apply(SparkSqlParser.scala:1090) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1.apply(SparkSqlParser.scala:1090) I tried various constructors for the carbon object without success. Note : I can create a Carbondata table and insert data from CSV file successfully, but it looks like when the save is executed it tries to create the (new) table and I get this error on "Stored by"... carbon.sql("create table IF NOT EXISTS carbon_test_csv(id int, name string, scale decimal, country string, salary double) STORED BY 'carbondata'") carbon.sql("LOAD DATA INPATH '/tmp/sample.csv' INTO TABLE hive_carbon_test_csv") Regards, Yann -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi Yann,
remove import spark.implicits._ and Use import carbon.implicits._ modified code: import java.io.File import org.apache.spark.sql.{SaveMode, SparkSession} import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.util.CarbonProperties import org.apache.spark.sql.CarbonSession._ val rootPath = new File(this.getClass.getResource("/").getPath).getCanonicalPath val storeLocation = s"$rootPath/store" val warehouse = s"$rootPath/warehouse" val metastoredb = s"$rootPath/metastore" CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss").addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd").addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE, "true").addProperty(CarbonCommonConstants.CARBON_BADRECORDS_LOC, "") val carbon = SparkSession.builder().config("spark.sql.warehouse.dir", warehouse).config("spark.sql.crossJoin.enabled", "true").getOrCreateCarbonSession(storeLocation) import carbon.implicits._ val df = carbon.sparkContext.parallelize(1 to 50).map(x => ("c1" + x % 10, "c2", x)).toDF("col1", "col2", "num") df.write.format("carbondata").option("tableName", "carbon_table").option("partitionColumns", "col1").mode(SaveMode.Overwrite).save() Regards Raghu On Wed, Aug 15, 2018 at 1:05 PM yannv <[hidden email]> wrote: > Hello, > > I am trying to create a carbon data table from a Spark Data Frame, however > I > am getting an error with the (automatic create table statement) > > I run this code on spark-shell (passing the carbon data assembly jar file > for 1.4.0 as well as master branch), on Azure HDInsight cluster with spark > 2.2.1. > > Code used : > > > import java.io.File > > import org.apache.spark.sql.{SaveMode, SparkSession} > > import org.apache.carbondata.core.constants.CarbonCommonConstants > import org.apache.carbondata.core.util.CarbonProperties > > import org.apache.spark.sql.CarbonSession._ > > import spark.implicits._ > > val rootPath = new > File(this.getClass.getResource("/").getPath).getCanonicalPath > val storeLocation = s"$rootPath/store" > val warehouse = s"$rootPath/warehouse" > val metastoredb = s"$rootPath/metastore" > > CarbonProperties.getInstance() > .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd > HH:mm:ss") > .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd") > .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE, "true") > .addProperty(CarbonCommonConstants.CARBON_BADRECORDS_LOC, "") > > > val carbon = SparkSession > .builder() > .config("spark.sql.warehouse.dir", warehouse) > .config("spark.sql.crossJoin.enabled", "true") > .getOrCreateCarbonSession(storeLocation, metastoredb) > > > val df = carbon.sparkContext.parallelize(1 to 50) > .map(x => ("c1" + x % 10, "c2", x)) > .toDF("col1", "col2", "num") > > df.write > .format("carbondata") > .option("tableName", "carbon_table") > .option("partitionColumns", "col1") > .mode(SaveMode.Overwrite) > .save() > > > > This is the error I am getting : > > org.apache.spark.sql.catalyst.parser.ParseException: > Operation not allowed: STORED BY(line 5, pos 1) > > == SQL == > > CREATE TABLE IF NOT EXISTS default.carbon_df_table_test1 > (c2 STRING, number INT) > PARTITIONED BY (c1 string) > STORED BY 'carbondata' > -^^^ > > TBLPROPERTIES ('STREAMING' = 'false') > > > > at > > org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39) > at > > org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateFileFormat$1.apply(SparkSqlParser.scala:1194) > at > > org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateFileFormat$1.apply(SparkSqlParser.scala:1186) > at > > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99) > at > > org.apache.spark.sql.execution.SparkSqlAstBuilder.visitCreateFileFormat(SparkSqlParser.scala:1185) > at > > org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1$$anonfun$31.apply(SparkSqlParser.scala:1090) > at > > org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1$$anonfun$31.apply(SparkSqlParser.scala:1090) > at scala.Option.map(Option.scala:146) > at > > org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1.apply(SparkSqlParser.scala:1090) > > > > I tried various constructors for the carbon object without success. > Note : I can create a Carbondata table and insert data from CSV file > successfully, but it looks like when the save is executed it tries to > create > the (new) table and I get this error on "Stored by"... > > carbon.sql("create table IF NOT EXISTS carbon_test_csv(id int, name string, > scale decimal, country string, salary double) STORED BY 'carbondata'") > carbon.sql("LOAD DATA INPATH '/tmp/sample.csv' INTO TABLE > hive_carbon_test_csv") > > Regards, > Yann > > > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Hi Raghu (Sraghunandan),
Thank you for your answer (use of carbon implicit vs spark implicit). It is now working as expected. Yann -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Free forum by Nabble | Edit this page |