Query Carbondata table from Hive (InvalidConfigurationException: Database name is not set)

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Query Carbondata table from Hive (InvalidConfigurationException: Database name is not set)

yannv
Hello,

I have created a carbondata table from Spark 2.2.1 on Azure (Hive 1.2.1) via
CarbonSession.

The Spark code looks like this :

val carbon = SparkSession.builder().config("spark.sql.warehouse.dir",
warehouse).config("spark.sql.crossJoin.enabled",
"true").enableHiveSupport().getOrCreateCarbonSession(storeLocation)

val df = carbon.sparkContext.parallelize(1 to 1000000).map(x => .... <my
object creation>)

import carbon.implicits._

df.write.format("carbondata").option("tableName",
"carbon_table_20180816_1m_50f_p8").mode(SaveMode.Overwrite).save()


Note : I can query this carbondata table without problems in Spark, but I
also need to query it in Hive...



However I cannot query it from Hive.

If I look at the table definition in Hive I see this :

hive> show create table carbon_table_20180816_1m_50f_p8;
OK
CREATE EXTERNAL TABLE `carbon_table_20180816_1m_50f_p8`(
  `col` array<string> COMMENT 'from deserializer')
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'carbonSchemaPartsNo'='6',
  'dbName'='default',
  'isExternal'='false',
  'isTransactional'='true',
  'isVisible'='true',
 
'path'='hdfs://hostname.internal.cloudapp.net/tmp/store/default/carbon_table_20180816_1m_50f_p8',
  'tableName'='carbon_table_20180816_1m_50f_p8',
 
'tablePath'='hdfs://hostname.internal.cloudapp.net/tmp/store/default/carbon_table_20180816_1m_50f_p8')
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.SequenceFileInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION
 
'wasb://[hidden email]/hive/warehouse/carbon_table_20180816_1m_50f_p8-__PLACEHOLDER__'
TBLPROPERTIES (
  'spark.sql.sources.provider'='org.apache.spark.sql.CarbonSource',
  'spark.sql.sources.schema.numParts'='1',
 
'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[{\"name\":\"c1\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c2\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c3\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c4\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c5\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c6\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c7\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c8\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c9\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c10\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c11\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c12\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c13\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c14\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c15\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c16\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c17\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c18\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c19\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c20\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c21\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c22\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c23\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c24\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c25\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c26\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c27\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c28\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c29\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c30\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c31\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c32\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c33\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c34\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c35\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c36\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c37\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c38\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c39\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c40\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c41\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c42\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c43\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c44\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c45\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c46\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c47\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c48\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c49\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c50\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}}]}',
  'transient_lastDdlTime'='1534416196')


It is clear that it is now using the Hive Carbondata SerDe.

I applied the following modifications based on
https://github.com/cenyuhai/incubator-carbondata/blob/CARBONDATA-727/integration/hive/hive-guide.md#alter-schema-in-hive



hive> alter table carbon_table_20180816_1m_50f_p8  set FILEFORMAT
    > INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
    > OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
    > SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";
OK


hive> alter table carbon_table_20180816_1m_50f_p8 set LOCATION
'hdfs://hostname.ax.internal.cloudapp.net/tmp/store/default/carbon_table_20180816_1m_50f_p8';
OK



hive> alter table carbon_table_20180816_1m_50f_p8 change col c1 string;
OK




Following these alter operations, I obtain the following table structure :


hive> show create table carbon_table_20180816_1m_50f_p8;
OK
CREATE EXTERNAL TABLE `carbon_table_20180816_1m_50f_p8`(
  `c1` string COMMENT '')
ROW FORMAT SERDE
  'org.apache.carbondata.hive.CarbonHiveSerDe'
WITH SERDEPROPERTIES (
  'carbonSchemaPartsNo'='6',
  'dbName'='default',
  'isExternal'='false',
  'isTransactional'='true',
  'isVisible'='true',
 
'path'='hdfs://hostname.internal.cloudapp.net/tmp/store/default/carbon_table_20180816_1m_50f_p8',
  'tableName'='carbon_table_20180816_1m_50f_p8',
 
'tablePath'='hdfs://hostname.internal.cloudapp.net/tmp/store/default/carbon_table_20180816_1m_50f_p8')
STORED AS INPUTFORMAT
  'org.apache.carbondata.hive.MapredCarbonInputFormat'
OUTPUTFORMAT
  'org.apache.carbondata.hive.MapredCarbonOutputFormat'
LOCATION
 
'hdfs://hostname.internal.cloudapp.net/tmp/store/default/carbon_table_20180816_1m_50f_p8'
TBLPROPERTIES (
  'last_modified_by'='sshuser',
  'last_modified_time'='1534423308',
  'numFiles'='7',
  'spark.sql.sources.provider'='org.apache.spark.sql.CarbonSource',
  'spark.sql.sources.schema.numParts'='1',
 
'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[{\"name\":\"c1\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c2\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c3\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c4\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c5\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c6\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c7\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c8\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c9\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c10\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c11\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c12\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c13\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c14\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c15\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c16\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c17\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c18\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c19\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c20\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c21\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c22\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c23\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c24\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c25\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c26\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c27\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c28\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c29\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c30\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c31\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c32\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c33\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c34\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c35\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c36\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c37\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c38\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c39\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c40\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c41\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c42\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c43\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c44\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c45\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c46\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c47\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c48\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c49\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c50\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}}]}',
  'totalSize'='11446940',
  'transient_lastDdlTime'='1534423308')


Before adding the extra columns (c2 to c50) I tried a basic select query,
and I am getting the following error :

hive> select c1 from carbon_table_20180816_1m_50f_p8 limit 1;
OK
Failed with exception java.io.IOException:java.io.IOException:
org.apache.carbondata.core.exception.InvalidConfigurationException: Database
name is not set.


The dataname seems to be well defined in 'dbName'='default' ...


Is there a better, easier way to query Carbondata tables (created from Spark
SQL) directly from Hive??


Your help would be appreciated.

Thanks

Yann



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Query Carbondata table from Hive (InvalidConfigurationException: Database name is not set)

yannv
Hello,

I would really appreciate your help on this error
InvalidConfigurationException: Database name is not set.

As anyone tried to read Carbondata from Hive on Azure?

I don't know whether this problem comes from Carbondata itself, the SerDe
properties have the dbName defined, however the error message seems to
indicate the SerDe cannot find the value.

Am I missing something obvious?

I also tried the example on
https://github.com/apache/carbondata/blob/master/integration/hive/hive-guide.md
without success.

Could you please help me figure out thatis going wrong with the Hive
integration?

Yann



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Query Carbondata table from Hive (InvalidConfigurationException: Database name is not set)

yannv
Ok, solved it.

I had to specify mapreduce.input.carboninputformat.databaseName as well as
mapreduce.input.carboninputformat.tableName in serde properties... did not
find this in the documentation by in the code itself
(org/apache/carbondata/hadoop/api/CarbonInputFormat.java)

Thanks anyway.

Yann



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/