Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)

cenyuhai
Hi, liang:
    I create a new profile "integration/hive" and the CI is OK now. But I still have some problems in altering hive metastore schema.
    My steps are as following:
   
1.build carbondata

mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package -Phadoop-2.7.2 -Phive-1.2.1

2.copy jars

mkdir ~/spark-2.1/carbon_lib
cp ~/cenyuhai/incubator-carbondata/assembly/target/scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar ~/spark-2.1/carbon_lib/
cp ~/cenyuhai/incubator-carbondata/integration/hive/target/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar ~/spark-2.1/carbon_lib/

3.create sample.csv and put it into hdfs

id,name,scale,country,salary
1,yuhai,1.77,china,33000.0
2,runlin,1.70,china,32000.0

4.create table in spark

spark-shell --jars "/data/hadoop/spark-2.1/carbon_lib/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar,/data/hadoop/spark-2.1/carbon_lib/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar"

#execute these commands:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._
val rootPath = "hdfs:////user/hadoop/carbon"
val storeLocation = s"$rootPath/store"
val warehouse = s"$rootPath/warehouse"
val metastoredb = s"$rootPath/metastore_db"

val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir", warehouse).config(org.apache.carbondata.core.constants.CarbonCommonConstants.STORE_LOCATION, storeLocation).getOrCreateCarbonSession(storeLocation, metastoredb)

carbon.sql("create table hive_carbon(id int, name string, scale decimal, country string, salary double) STORED BY 'carbondata'")
carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/hadoop/sample.csv' INTO TABLE hive_carbon")

5.alter table schema in hive

cp ~/spark-2.1/carbon_lib/carbon-assembly-*.jar hive/auxlibs/
cp spark-catalyst*.jar hive/auxlibs/
export HIVE_AUX_JARS_PATH=hive/auxlibs/

#start hive cli
./$HIVE_HOME/bin/hive

#execute commands:
alter table hive_carbon set FILEFORMAT
INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";

alter table hive_carbon set LOCATION 'hdfs://mycluster-tj/user/hadoop/carbon/store/default/hive_carbon';
alter table hive_carbon change col id INT;  
alter table hive_carbon add columns(name string, scale decimal, country string, salary double);


6.check table schema

execute "show create table hive_carbon"


7. execute "select * from hive_carbon" and "select * from hive_carbon order by id"








8.the table are still available in spark




------------------ Original ------------------
From:  "Liang Chen";<[hidden email]>;
Date:  Thu, Mar 23, 2017 00:09 AM
To:  "apache/incubator-carbondata"<[hidden email]>;
Cc:  "Sea"<[hidden email]>; "Mention"<[hidden email]>;
Subject:  Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)

@cenyuhai Thank you contributed this feature.
Suggest creating a new profile for "integration/hive" module, and let all hive related code decoupled from current modules, let CI run normally first.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Reply | Threaded
Open this post in threaded view
|

Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)

Liang Chen
Administrator
Hi

Thanks for your great contributions.

Regards
Liang

cenyuhai wrote
Hi, liang:
    I create a new profile "integration/hive" and the CI is OK now. But I still have some problems in altering hive metastore schema.
    My steps are as following:
   
1.build carbondata


mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package -Phadoop-2.7.2 -Phive-1.2.1



2.copy jars


mkdir ~/spark-2.1/carbon_lib
cp ~/cenyuhai/incubator-carbondata/assembly/target/scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar ~/spark-2.1/carbon_lib/
cp ~/cenyuhai/incubator-carbondata/integration/hive/target/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar ~/spark-2.1/carbon_lib/



3.create sample.csv and put it into hdfs


id,name,scale,country,salary
1,yuhai,1.77,china,33000.0
2,runlin,1.70,china,32000.0



4.create table in spark


spark-shell --jars "/data/hadoop/spark-2.1/carbon_lib/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar,/data/hadoop/spark-2.1/carbon_lib/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar"


#execute these commands:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._
val rootPath = "hdfs:////user/hadoop/carbon"
val storeLocation = s"$rootPath/store"
val warehouse = s"$rootPath/warehouse"
val metastoredb = s"$rootPath/metastore_db"


val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir", warehouse).config(org.apache.carbondata.core.constants.CarbonCommonConstants.STORE_LOCATION, storeLocation).getOrCreateCarbonSession(storeLocation, metastoredb)


carbon.sql("create table hive_carbon(id int, name string, scale decimal, country string, salary double) STORED BY 'carbondata'")
carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/hadoop/sample.csv' INTO TABLE hive_carbon")



5.alter table schema in hive


cp ~/spark-2.1/carbon_lib/carbon-assembly-*.jar hive/auxlibs/
cp spark-catalyst*.jar hive/auxlibs/
export HIVE_AUX_JARS_PATH=hive/auxlibs/


#start hive cli
./$HIVE_HOME/bin/hive


#execute commands:
alter table hive_carbon set FILEFORMAT
INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";


alter table hive_carbon set LOCATION 'hdfs://mycluster-tj/user/hadoop/carbon/store/default/hive_carbon';
alter table hive_carbon change col id INT;  
alter table hive_carbon add columns(name string, scale decimal, country string, salary double);





6.check table schema


execute "show create table hive_carbon"





7. execute "select * from hive_carbon" and "select * from hive_carbon order by id"


















8.the table are still available in spark









------------------ Original ------------------
From:  "Liang Chen";<[hidden email]>;
Date:  Thu, Mar 23, 2017 00:09 AM
To:  "apache/incubator-carbondata"<[hidden email]>;
Cc:  "Sea"<[hidden email]>; "Mention"<[hidden email]>;
Subject:  Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)




@cenyuhai  Thank you contributed this feature.
 Suggest creating a new profile for "integration/hive" module,  and let all hive related code decoupled from current modules,  let CI run normally first.
 

You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
Reply | Threaded
Open this post in threaded view
|

Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)

cenyuhai
In reply to this post by cenyuhai
I forgot something.
Before query data from hive. We should set
set hive.mapred.supports.subdirectories=true;
set mapreduce.input.fileinputformat.input.dir.recursive=true;


------------------ Original ------------------
From:  "261810726";<[hidden email]>;
Date:  Thu, Mar 23, 2017 09:58 PM
To:  "chenliang613"<[hidden email]>; "dev"<[hidden email]>;
Cc:  "Mention"<[hidden email]>;
Subject:  Re:  [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)



Hi, liang:
    I create a new profile "integration/hive" and the CI is OK now. But I still have some problems in altering hive metastore schema.
    My steps are as following:
   
1.build carbondata


mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package -Phadoop-2.7.2 -Phive-1.2.1



2.copy jars


mkdir ~/spark-2.1/carbon_lib
cp ~/cenyuhai/incubator-carbondata/assembly/target/scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar ~/spark-2.1/carbon_lib/
cp ~/cenyuhai/incubator-carbondata/integration/hive/target/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar ~/spark-2.1/carbon_lib/



3.create sample.csv and put it into hdfs


id,name,scale,country,salary
1,yuhai,1.77,china,33000.0
2,runlin,1.70,china,32000.0



4.create table in spark


spark-shell --jars "/data/hadoop/spark-2.1/carbon_lib/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar,/data/hadoop/spark-2.1/carbon_lib/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar"


#execute these commands:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._
val rootPath = "hdfs:////user/hadoop/carbon"
val storeLocation = s"$rootPath/store"
val warehouse = s"$rootPath/warehouse"
val metastoredb = s"$rootPath/metastore_db"


val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir", warehouse).config(org.apache.carbondata.core.constants.CarbonCommonConstants.STORE_LOCATION, storeLocation).getOrCreateCarbonSession(storeLocation, metastoredb)


carbon.sql("create table hive_carbon(id int, name string, scale decimal, country string, salary double) STORED BY 'carbondata'")
carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/hadoop/sample.csv' INTO TABLE hive_carbon")



5.alter table schema in hive


cp ~/spark-2.1/carbon_lib/carbon-assembly-*.jar hive/auxlibs/
cp spark-catalyst*.jar hive/auxlibs/
export HIVE_AUX_JARS_PATH=hive/auxlibs/


#start hive cli
./$HIVE_HOME/bin/hive


#execute commands:
alter table hive_carbon set FILEFORMAT
INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";


alter table hive_carbon set LOCATION 'hdfs://mycluster-tj/user/hadoop/carbon/store/default/hive_carbon';
alter table hive_carbon change col id INT;  
alter table hive_carbon add columns(name string, scale decimal, country string, salary double);





6.check table schema


execute "show create table hive_carbon"





7. execute "select * from hive_carbon" and "select * from hive_carbon order by id"


















8.the table are still available in spark









------------------ Original ------------------
From:  "Liang Chen";<[hidden email]>;
Date:  Thu, Mar 23, 2017 00:09 AM
To:  "apache/incubator-carbondata"<[hidden email]>;
Cc:  "Sea"<[hidden email]>; "Mention"<[hidden email]>;
Subject:  Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)




@cenyuhai  Thank you contributed this feature.
 Suggest creating a new profile for "integration/hive" module,  and let all hive related code decoupled from current modules,  let CI run normally first.
 

You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
Reply | Threaded
Open this post in threaded view
|

Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)

anubhavtarar
@sea hi i tried to use hive with the steps you mentioned from you pr but
get table not found exception from hive cli, here are the steps i use

1.start the spark shell with hive and carbon bulids

./spark-shell --jars /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.incubating-SNAPSHOT-shade-hadoop2.7.2.
jar,/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/carbondata-hive-1.1.0-
incubating-SNAPSHOT.jar

2.create the carbonsession and create and load tables

scala> import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.CarbonSession._

scala> import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SparkSession

scala> val carbon = SparkSession.builder().enableHiveSupport().config(sc.
getConf).getOrCreateCarbonSession("hdfs://localhost:54310/opt/carbonStore")

scala>carbon.sql("create table hive_carbon(id int, name string, scale
decimal, country string, salary double) STORED BY 'carbondata'")
scala>carbon.sql("LOAD DATA INPATH 'hdfs://localhost:54310/sample.csv' INTO
TABLE hive_carbon")

3.start hive cli and added the jars

hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata-hive-1.1.0-incubating-SNAPSHOT.jar;
Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata-hive-1.1.0-incubating-SNAPSHOT.jar] to class path
Added resources: [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata-hive-1.1.0-incubating-SNAPSHOT.jar]

hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar;
Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar] to class
path
Added resources: [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar]

hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/jars/spark-catalyst_
2.11-2.1.0.jar;
Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/jars/spark-catalyst_2.11-2.1.0.jar]
to class path
Added resources: [/home/hduser/spark-2.1.0-bin-
hadoop2.7/jars/spark-catalyst_2.11-2.1.0.jar]


4.query data using hive

hive> select * from hive_carbon;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found
'hive_carbon'







On Fri, Mar 24, 2017 at 9:30 AM, Sea <[hidden email]> wrote:

> I forgot something.
> Before query data from hive. We should set
> set hive.mapred.supports.subdirectories=true;
> set mapreduce.input.fileinputformat.input.dir.recursive=true;
>
>
> ------------------ Original ------------------
> From:  "261810726";<[hidden email]>;
> Date:  Thu, Mar 23, 2017 09:58 PM
> To:  "chenliang613"<[hidden email]>; "dev"<dev@carbondata.
> incubator.apache.org>;
> Cc:  "Mention"<[hidden email]>;
> Subject:  Re:  [apache/incubator-carbondata] [CARBONDATA-727][WIP] add
> hiveintegration for carbon (#672)
>
>
>
> Hi, liang:
>     I create a new profile "integration/hive" and the CI is OK now. But I
> still have some problems in altering hive metastore schema.
>     My steps are as following:
>
> 1.build carbondata
>
>
> mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package
> -Phadoop-2.7.2 -Phive-1.2.1
>
>
>
> 2.copy jars
>
>
> mkdir ~/spark-2.1/carbon_lib
> cp ~/cenyuhai/incubator-carbondata/assembly/target/
> scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar
> ~/spark-2.1/carbon_lib/
> cp ~/cenyuhai/incubator-carbondata/integration/hive/
> target/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar
> ~/spark-2.1/carbon_lib/
>
>
>
> 3.create sample.csv and put it into hdfs
>
>
> id,name,scale,country,salary
> 1,yuhai,1.77,china,33000.0
> 2,runlin,1.70,china,32000.0
>
>
>
> 4.create table in spark
>
>
> spark-shell --jars "/data/hadoop/spark-2.1/carbon_lib/carbondata_2.11-1.
> 1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar,/data/hadoop/
> spark-2.1/carbon_lib/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar"
>
>
> #execute these commands:
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.CarbonSession._
> val rootPath = "hdfs:////user/hadoop/carbon"
> val storeLocation = s"$rootPath/store"
> val warehouse = s"$rootPath/warehouse"
> val metastoredb = s"$rootPath/metastore_db"
>
>
> val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir",
> warehouse).config(org.apache.carbondata.core.constants.
> CarbonCommonConstants.STORE_LOCATION, storeLocation).
> getOrCreateCarbonSession(storeLocation, metastoredb)
>
>
> carbon.sql("create table hive_carbon(id int, name string, scale decimal,
> country string, salary double) STORED BY 'carbondata'")
> carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/hadoop/sample.csv'
> INTO TABLE hive_carbon")
>
>
>
> 5.alter table schema in hive
>
>
> cp ~/spark-2.1/carbon_lib/carbon-assembly-*.jar hive/auxlibs/
> cp spark-catalyst*.jar hive/auxlibs/
> export HIVE_AUX_JARS_PATH=hive/auxlibs/
>
>
> #start hive cli
> ./$HIVE_HOME/bin/hive
>
>
> #execute commands:
> alter table hive_carbon set FILEFORMAT
> INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
> OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
> SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";
>
>
> alter table hive_carbon set LOCATION 'hdfs://mycluster-tj/user/
> hadoop/carbon/store/default/hive_carbon';
> alter table hive_carbon change col id INT;
> alter table hive_carbon add columns(name string, scale decimal, country
> string, salary double);
>
>
>
>
>
> 6.check table schema
>
>
> execute "show create table hive_carbon"
>
>
>
>
>
> 7. execute "select * from hive_carbon" and "select * from hive_carbon
> order by id"
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 8.the table are still available in spark
>
>
>
>
>
>
>
>
>
> ------------------ Original ------------------
> From:  "Liang Chen";<[hidden email]>;
> Date:  Thu, Mar 23, 2017 00:09 AM
> To:  "apache/incubator-carbondata"<[hidden email]
> >;
> Cc:  "Sea"<[hidden email]>; "Mention"<[hidden email]>;
> Subject:  Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add
> hiveintegration for carbon (#672)
>
>
>
>
> @cenyuhai  Thank you contributed this feature.
>  Suggest creating a new profile for "integration/hive" module,  and let
> all hive related code decoupled from current modules,  let CI run normally
> first.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.
>



--
Thanks and Regards

*   Anubhav Tarar     *


* Software Consultant*
      *Knoldus Software LLP <http://www.knoldus.com/home.knol>       *
       LinkedIn <http://in.linkedin.com/in/rahulforallp>     Twitter
<https://twitter.com/RahulKu71223673>    fb <[hidden email]>
          mob : 8588915184
Reply | Threaded
Open this post in threaded view
|

Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] addhiveintegration for carbon (#672)

cenyuhai
Hi, Anubhav:
    Do you use mysql to store the hive metadata?spark sql and hive must use the same metastore.
    PS: Before you query data using hive,  you should alter table schema.


    This is the latest guide.
https://github.com/cenyuhai/incubator-carbondata/blob/CARBONDATA-727/integration/hive/hive-guide.md




------------------ Original ------------------
From:  "Anubhav Tarar";<[hidden email]>;
Date:  Mon, Mar 27, 2017 02:59 PM
To:  "dev"<[hidden email]>;
Cc:  "chenliang613"<[hidden email]>; "Mention"<[hidden email]>;
Subject:  Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] addhiveintegration for carbon (#672)



@sea hi i tried to use hive with the steps you mentioned from you pr but
get table not found exception from hive cli, here are the steps i use

1.start the spark shell with hive and carbon bulids

./spark-shell --jars /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.incubating-SNAPSHOT-shade-hadoop2.7.2.
jar,/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/carbondata-hive-1.1.0-
incubating-SNAPSHOT.jar

2.create the carbonsession and create and load tables

scala> import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.CarbonSession._

scala> import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SparkSession

scala> val carbon = SparkSession.builder().enableHiveSupport().config(sc.
getConf).getOrCreateCarbonSession("hdfs://localhost:54310/opt/carbonStore")

scala>carbon.sql("create table hive_carbon(id int, name string, scale
decimal, country string, salary double) STORED BY 'carbondata'")
scala>carbon.sql("LOAD DATA INPATH 'hdfs://localhost:54310/sample.csv' INTO
TABLE hive_carbon")

3.start hive cli and added the jars

hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata-hive-1.1.0-incubating-SNAPSHOT.jar;
Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata-hive-1.1.0-incubating-SNAPSHOT.jar] to class path
Added resources: [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata-hive-1.1.0-incubating-SNAPSHOT.jar]

hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar;
Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar] to class
path
Added resources: [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar]

hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/jars/spark-catalyst_
2.11-2.1.0.jar;
Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/jars/spark-catalyst_2.11-2.1.0.jar]
to class path
Added resources: [/home/hduser/spark-2.1.0-bin-
hadoop2.7/jars/spark-catalyst_2.11-2.1.0.jar]


4.query data using hive

hive> select * from hive_carbon;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found
'hive_carbon'







On Fri, Mar 24, 2017 at 9:30 AM, Sea <[hidden email]> wrote:

> I forgot something.
> Before query data from hive. We should set
> set hive.mapred.supports.subdirectories=true;
> set mapreduce.input.fileinputformat.input.dir.recursive=true;
>
>
> ------------------ Original ------------------
> From:  "261810726";<[hidden email]>;
> Date:  Thu, Mar 23, 2017 09:58 PM
> To:  "chenliang613"<[hidden email]>; "dev"<dev@carbondata.
> incubator.apache.org>;
> Cc:  "Mention"<[hidden email]>;
> Subject:  Re:  [apache/incubator-carbondata] [CARBONDATA-727][WIP] add
> hiveintegration for carbon (#672)
>
>
>
> Hi, liang:
>     I create a new profile "integration/hive" and the CI is OK now. But I
> still have some problems in altering hive metastore schema.
>     My steps are as following:
>
> 1.build carbondata
>
>
> mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package
> -Phadoop-2.7.2 -Phive-1.2.1
>
>
>
> 2.copy jars
>
>
> mkdir ~/spark-2.1/carbon_lib
> cp ~/cenyuhai/incubator-carbondata/assembly/target/
> scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar
> ~/spark-2.1/carbon_lib/
> cp ~/cenyuhai/incubator-carbondata/integration/hive/
> target/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar
> ~/spark-2.1/carbon_lib/
>
>
>
> 3.create sample.csv and put it into hdfs
>
>
> id,name,scale,country,salary
> 1,yuhai,1.77,china,33000.0
> 2,runlin,1.70,china,32000.0
>
>
>
> 4.create table in spark
>
>
> spark-shell --jars "/data/hadoop/spark-2.1/carbon_lib/carbondata_2.11-1.
> 1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar,/data/hadoop/
> spark-2.1/carbon_lib/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar"
>
>
> #execute these commands:
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.CarbonSession._
> val rootPath = "hdfs:////user/hadoop/carbon"
> val storeLocation = s"$rootPath/store"
> val warehouse = s"$rootPath/warehouse"
> val metastoredb = s"$rootPath/metastore_db"
>
>
> val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir",
> warehouse).config(org.apache.carbondata.core.constants.
> CarbonCommonConstants.STORE_LOCATION, storeLocation).
> getOrCreateCarbonSession(storeLocation, metastoredb)
>
>
> carbon.sql("create table hive_carbon(id int, name string, scale decimal,
> country string, salary double) STORED BY 'carbondata'")
> carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/hadoop/sample.csv'
> INTO TABLE hive_carbon")
>
>
>
> 5.alter table schema in hive
>
>
> cp ~/spark-2.1/carbon_lib/carbon-assembly-*.jar hive/auxlibs/
> cp spark-catalyst*.jar hive/auxlibs/
> export HIVE_AUX_JARS_PATH=hive/auxlibs/
>
>
> #start hive cli
> ./$HIVE_HOME/bin/hive
>
>
> #execute commands:
> alter table hive_carbon set FILEFORMAT
> INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
> OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
> SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";
>
>
> alter table hive_carbon set LOCATION 'hdfs://mycluster-tj/user/
> hadoop/carbon/store/default/hive_carbon';
> alter table hive_carbon change col id INT;
> alter table hive_carbon add columns(name string, scale decimal, country
> string, salary double);
>
>
>
>
>
> 6.check table schema
>
>
> execute "show create table hive_carbon"
>
>
>
>
>
> 7. execute "select * from hive_carbon" and "select * from hive_carbon
> order by id"
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 8.the table are still available in spark
>
>
>
>
>
>
>
>
>
> ------------------ Original ------------------
> From:  "Liang Chen";<[hidden email]>;
> Date:  Thu, Mar 23, 2017 00:09 AM
> To:  "apache/incubator-carbondata"<[hidden email]
> >;
> Cc:  "Sea"<[hidden email]>; "Mention"<[hidden email]>;
> Subject:  Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add
> hiveintegration for carbon (#672)
>
>
>
>
> @cenyuhai  Thank you contributed this feature.
>  Suggest creating a new profile for "integration/hive" module,  and let
> all hive related code decoupled from current modules,  let CI run normally
> first.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.
>



--
Thanks and Regards

*   Anubhav Tarar     *


* Software Consultant*
      *Knoldus Software LLP <http://www.knoldus.com/home.knol>       *
       LinkedIn <http://in.linkedin.com/in/rahulforallp>     Twitter
<https://twitter.com/RahulKu71223673>    fb <[hidden email]>
          mob : 8588915184