Hi, liang: I create a new profile "integration/hive" and the CI is OK now. But I still have some problems in altering hive metastore schema. My steps are as following: 1.build carbondata mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package -Phadoop-2.7.2 -Phive-1.2.1 2.copy jars mkdir ~/spark-2.1/carbon_lib cp ~/cenyuhai/incubator-carbondata/assembly/target/scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar ~/spark-2.1/carbon_lib/ cp ~/cenyuhai/incubator-carbondata/integration/hive/target/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar ~/spark-2.1/carbon_lib/ 3.create sample.csv and put it into hdfs id,name,scale,country,salary 1,yuhai,1.77,china,33000.0 2,runlin,1.70,china,32000.0 4.create table in spark spark-shell --jars "/data/hadoop/spark-2.1/carbon_lib/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar,/data/hadoop/spark-2.1/carbon_lib/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar" #execute these commands: import org.apache.spark.sql.SparkSession import org.apache.spark.sql.CarbonSession._ val rootPath = "hdfs:////user/hadoop/carbon" val storeLocation = s"$rootPath/store" val warehouse = s"$rootPath/warehouse" val metastoredb = s"$rootPath/metastore_db" val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir", warehouse).config(org.apache.carbondata.core.constants.CarbonCommonConstants.STORE_LOCATION, storeLocation).getOrCreateCarbonSession(storeLocation, metastoredb) carbon.sql("create table hive_carbon(id int, name string, scale decimal, country string, salary double) STORED BY 'carbondata'") carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/hadoop/sample.csv' INTO TABLE hive_carbon") 5.alter table schema in hive cp ~/spark-2.1/carbon_lib/carbon-assembly-*.jar hive/auxlibs/ cp spark-catalyst*.jar hive/auxlibs/ export HIVE_AUX_JARS_PATH=hive/auxlibs/ #start hive cli ./$HIVE_HOME/bin/hive #execute commands: alter table hive_carbon set FILEFORMAT INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat" OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat" SERDE "org.apache.carbondata.hive.CarbonHiveSerDe"; alter table hive_carbon set LOCATION 'hdfs://mycluster-tj/user/hadoop/carbon/store/default/hive_carbon'; alter table hive_carbon change col id INT; alter table hive_carbon add columns(name string, scale decimal, country string, salary double); 6.check table schema execute "show create table hive_carbon" 7. execute "select * from hive_carbon" and "select * from hive_carbon order by id" 8.the table are still available in spark ------------------ Original ------------------ From: "Liang Chen";<[hidden email]>; Date: Thu, Mar 23, 2017 00:09 AM To: "apache/incubator-carbondata"<[hidden email]>; Subject: Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672) @cenyuhai Thank you contributed this feature. — |
Administrator
|
Hi
Thanks for your great contributions. Regards Liang
|
In reply to this post by cenyuhai
I forgot something.
Before query data from hive. We should set set hive.mapred.supports.subdirectories=true; set mapreduce.input.fileinputformat.input.dir.recursive=true; ------------------ Original ------------------ From: "261810726";<[hidden email]>; Date: Thu, Mar 23, 2017 09:58 PM To: "chenliang613"<[hidden email]>; "dev"<[hidden email]>; Cc: "Mention"<[hidden email]>; Subject: Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672) Hi, liang: I create a new profile "integration/hive" and the CI is OK now. But I still have some problems in altering hive metastore schema. My steps are as following: 1.build carbondata mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package -Phadoop-2.7.2 -Phive-1.2.1 2.copy jars mkdir ~/spark-2.1/carbon_lib cp ~/cenyuhai/incubator-carbondata/assembly/target/scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar ~/spark-2.1/carbon_lib/ cp ~/cenyuhai/incubator-carbondata/integration/hive/target/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar ~/spark-2.1/carbon_lib/ 3.create sample.csv and put it into hdfs id,name,scale,country,salary 1,yuhai,1.77,china,33000.0 2,runlin,1.70,china,32000.0 4.create table in spark spark-shell --jars "/data/hadoop/spark-2.1/carbon_lib/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar,/data/hadoop/spark-2.1/carbon_lib/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar" #execute these commands: import org.apache.spark.sql.SparkSession import org.apache.spark.sql.CarbonSession._ val rootPath = "hdfs:////user/hadoop/carbon" val storeLocation = s"$rootPath/store" val warehouse = s"$rootPath/warehouse" val metastoredb = s"$rootPath/metastore_db" val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir", warehouse).config(org.apache.carbondata.core.constants.CarbonCommonConstants.STORE_LOCATION, storeLocation).getOrCreateCarbonSession(storeLocation, metastoredb) carbon.sql("create table hive_carbon(id int, name string, scale decimal, country string, salary double) STORED BY 'carbondata'") carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/hadoop/sample.csv' INTO TABLE hive_carbon") 5.alter table schema in hive cp ~/spark-2.1/carbon_lib/carbon-assembly-*.jar hive/auxlibs/ cp spark-catalyst*.jar hive/auxlibs/ export HIVE_AUX_JARS_PATH=hive/auxlibs/ #start hive cli ./$HIVE_HOME/bin/hive #execute commands: alter table hive_carbon set FILEFORMAT INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat" OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat" SERDE "org.apache.carbondata.hive.CarbonHiveSerDe"; alter table hive_carbon set LOCATION 'hdfs://mycluster-tj/user/hadoop/carbon/store/default/hive_carbon'; alter table hive_carbon change col id INT; alter table hive_carbon add columns(name string, scale decimal, country string, salary double); 6.check table schema execute "show create table hive_carbon" 7. execute "select * from hive_carbon" and "select * from hive_carbon order by id" 8.the table are still available in spark ------------------ Original ------------------ From: "Liang Chen";<[hidden email]>; Date: Thu, Mar 23, 2017 00:09 AM To: "apache/incubator-carbondata"<[hidden email]>; Cc: "Sea"<[hidden email]>; "Mention"<[hidden email]>; Subject: Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672) @cenyuhai Thank you contributed this feature. Suggest creating a new profile for "integration/hive" module, and let all hive related code decoupled from current modules, let CI run normally first. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread. |
@sea hi i tried to use hive with the steps you mentioned from you pr but
get table not found exception from hive cli, here are the steps i use 1.start the spark shell with hive and carbon bulids ./spark-shell --jars /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/ carbondata_2.11-1.incubating-SNAPSHOT-shade-hadoop2.7.2. jar,/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/carbondata-hive-1.1.0- incubating-SNAPSHOT.jar 2.create the carbonsession and create and load tables scala> import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.CarbonSession._ scala> import org.apache.spark.sql.SparkSession import org.apache.spark.sql.SparkSession scala> val carbon = SparkSession.builder().enableHiveSupport().config(sc. getConf).getOrCreateCarbonSession("hdfs://localhost:54310/opt/carbonStore") scala>carbon.sql("create table hive_carbon(id int, name string, scale decimal, country string, salary double) STORED BY 'carbondata'") scala>carbon.sql("LOAD DATA INPATH 'hdfs://localhost:54310/sample.csv' INTO TABLE hive_carbon") 3.start hive cli and added the jars hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/ carbondata-hive-1.1.0-incubating-SNAPSHOT.jar; Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/ carbondata-hive-1.1.0-incubating-SNAPSHOT.jar] to class path Added resources: [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/ carbondata-hive-1.1.0-incubating-SNAPSHOT.jar] hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/ carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar; Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/ carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar] to class path Added resources: [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/ carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar] hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/jars/spark-catalyst_ 2.11-2.1.0.jar; Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/jars/spark-catalyst_2.11-2.1.0.jar] to class path Added resources: [/home/hduser/spark-2.1.0-bin- hadoop2.7/jars/spark-catalyst_2.11-2.1.0.jar] 4.query data using hive hive> select * from hive_carbon; FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'hive_carbon' On Fri, Mar 24, 2017 at 9:30 AM, Sea <[hidden email]> wrote: > I forgot something. > Before query data from hive. We should set > set hive.mapred.supports.subdirectories=true; > set mapreduce.input.fileinputformat.input.dir.recursive=true; > > > ------------------ Original ------------------ > From: "261810726";<[hidden email]>; > Date: Thu, Mar 23, 2017 09:58 PM > To: "chenliang613"<[hidden email]>; "dev"<dev@carbondata. > incubator.apache.org>; > Cc: "Mention"<[hidden email]>; > Subject: Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add > hiveintegration for carbon (#672) > > > > Hi, liang: > I create a new profile "integration/hive" and the CI is OK now. But I > still have some problems in altering hive metastore schema. > My steps are as following: > > 1.build carbondata > > > mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package > -Phadoop-2.7.2 -Phive-1.2.1 > > > > 2.copy jars > > > mkdir ~/spark-2.1/carbon_lib > cp ~/cenyuhai/incubator-carbondata/assembly/target/ > scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar > ~/spark-2.1/carbon_lib/ > cp ~/cenyuhai/incubator-carbondata/integration/hive/ > target/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar > ~/spark-2.1/carbon_lib/ > > > > 3.create sample.csv and put it into hdfs > > > id,name,scale,country,salary > 1,yuhai,1.77,china,33000.0 > 2,runlin,1.70,china,32000.0 > > > > 4.create table in spark > > > spark-shell --jars "/data/hadoop/spark-2.1/carbon_lib/carbondata_2.11-1. > 1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar,/data/hadoop/ > spark-2.1/carbon_lib/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar" > > > #execute these commands: > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.CarbonSession._ > val rootPath = "hdfs:////user/hadoop/carbon" > val storeLocation = s"$rootPath/store" > val warehouse = s"$rootPath/warehouse" > val metastoredb = s"$rootPath/metastore_db" > > > val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir", > warehouse).config(org.apache.carbondata.core.constants. > CarbonCommonConstants.STORE_LOCATION, storeLocation). > getOrCreateCarbonSession(storeLocation, metastoredb) > > > carbon.sql("create table hive_carbon(id int, name string, scale decimal, > country string, salary double) STORED BY 'carbondata'") > carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/hadoop/sample.csv' > INTO TABLE hive_carbon") > > > > 5.alter table schema in hive > > > cp ~/spark-2.1/carbon_lib/carbon-assembly-*.jar hive/auxlibs/ > cp spark-catalyst*.jar hive/auxlibs/ > export HIVE_AUX_JARS_PATH=hive/auxlibs/ > > > #start hive cli > ./$HIVE_HOME/bin/hive > > > #execute commands: > alter table hive_carbon set FILEFORMAT > INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat" > OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat" > SERDE "org.apache.carbondata.hive.CarbonHiveSerDe"; > > > alter table hive_carbon set LOCATION 'hdfs://mycluster-tj/user/ > hadoop/carbon/store/default/hive_carbon'; > alter table hive_carbon change col id INT; > alter table hive_carbon add columns(name string, scale decimal, country > string, salary double); > > > > > > 6.check table schema > > > execute "show create table hive_carbon" > > > > > > 7. execute "select * from hive_carbon" and "select * from hive_carbon > order by id" > > > > > > > > > > > > > > > > > > > 8.the table are still available in spark > > > > > > > > > > ------------------ Original ------------------ > From: "Liang Chen";<[hidden email]>; > Date: Thu, Mar 23, 2017 00:09 AM > To: "apache/incubator-carbondata"<[hidden email] > >; > Cc: "Sea"<[hidden email]>; "Mention"<[hidden email]>; > Subject: Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add > hiveintegration for carbon (#672) > > > > > @cenyuhai Thank you contributed this feature. > Suggest creating a new profile for "integration/hive" module, and let > all hive related code decoupled from current modules, let CI run normally > first. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or mute the thread. > -- Thanks and Regards * Anubhav Tarar * * Software Consultant* *Knoldus Software LLP <http://www.knoldus.com/home.knol> * LinkedIn <http://in.linkedin.com/in/rahulforallp> Twitter <https://twitter.com/RahulKu71223673> fb <[hidden email]> mob : 8588915184 |
Hi, Anubhav:
Do you use mysql to store the hive metadata?spark sql and hive must use the same metastore. PS: Before you query data using hive, you should alter table schema. This is the latest guide. https://github.com/cenyuhai/incubator-carbondata/blob/CARBONDATA-727/integration/hive/hive-guide.md ------------------ Original ------------------ From: "Anubhav Tarar";<[hidden email]>; Date: Mon, Mar 27, 2017 02:59 PM To: "dev"<[hidden email]>; Cc: "chenliang613"<[hidden email]>; "Mention"<[hidden email]>; Subject: Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] addhiveintegration for carbon (#672) @sea hi i tried to use hive with the steps you mentioned from you pr but get table not found exception from hive cli, here are the steps i use 1.start the spark shell with hive and carbon bulids ./spark-shell --jars /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/ carbondata_2.11-1.incubating-SNAPSHOT-shade-hadoop2.7.2. jar,/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/carbondata-hive-1.1.0- incubating-SNAPSHOT.jar 2.create the carbonsession and create and load tables scala> import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.CarbonSession._ scala> import org.apache.spark.sql.SparkSession import org.apache.spark.sql.SparkSession scala> val carbon = SparkSession.builder().enableHiveSupport().config(sc. getConf).getOrCreateCarbonSession("hdfs://localhost:54310/opt/carbonStore") scala>carbon.sql("create table hive_carbon(id int, name string, scale decimal, country string, salary double) STORED BY 'carbondata'") scala>carbon.sql("LOAD DATA INPATH 'hdfs://localhost:54310/sample.csv' INTO TABLE hive_carbon") 3.start hive cli and added the jars hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/ carbondata-hive-1.1.0-incubating-SNAPSHOT.jar; Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/ carbondata-hive-1.1.0-incubating-SNAPSHOT.jar] to class path Added resources: [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/ carbondata-hive-1.1.0-incubating-SNAPSHOT.jar] hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/ carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar; Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/ carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar] to class path Added resources: [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/ carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar] hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/jars/spark-catalyst_ 2.11-2.1.0.jar; Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/jars/spark-catalyst_2.11-2.1.0.jar] to class path Added resources: [/home/hduser/spark-2.1.0-bin- hadoop2.7/jars/spark-catalyst_2.11-2.1.0.jar] 4.query data using hive hive> select * from hive_carbon; FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'hive_carbon' On Fri, Mar 24, 2017 at 9:30 AM, Sea <[hidden email]> wrote: > I forgot something. > Before query data from hive. We should set > set hive.mapred.supports.subdirectories=true; > set mapreduce.input.fileinputformat.input.dir.recursive=true; > > > ------------------ Original ------------------ > From: "261810726";<[hidden email]>; > Date: Thu, Mar 23, 2017 09:58 PM > To: "chenliang613"<[hidden email]>; "dev"<dev@carbondata. > incubator.apache.org>; > Cc: "Mention"<[hidden email]>; > Subject: Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add > hiveintegration for carbon (#672) > > > > Hi, liang: > I create a new profile "integration/hive" and the CI is OK now. But I > still have some problems in altering hive metastore schema. > My steps are as following: > > 1.build carbondata > > > mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package > -Phadoop-2.7.2 -Phive-1.2.1 > > > > 2.copy jars > > > mkdir ~/spark-2.1/carbon_lib > cp ~/cenyuhai/incubator-carbondata/assembly/target/ > scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar > ~/spark-2.1/carbon_lib/ > cp ~/cenyuhai/incubator-carbondata/integration/hive/ > target/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar > ~/spark-2.1/carbon_lib/ > > > > 3.create sample.csv and put it into hdfs > > > id,name,scale,country,salary > 1,yuhai,1.77,china,33000.0 > 2,runlin,1.70,china,32000.0 > > > > 4.create table in spark > > > spark-shell --jars "/data/hadoop/spark-2.1/carbon_lib/carbondata_2.11-1. > 1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar,/data/hadoop/ > spark-2.1/carbon_lib/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar" > > > #execute these commands: > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.CarbonSession._ > val rootPath = "hdfs:////user/hadoop/carbon" > val storeLocation = s"$rootPath/store" > val warehouse = s"$rootPath/warehouse" > val metastoredb = s"$rootPath/metastore_db" > > > val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir", > warehouse).config(org.apache.carbondata.core.constants. > CarbonCommonConstants.STORE_LOCATION, storeLocation). > getOrCreateCarbonSession(storeLocation, metastoredb) > > > carbon.sql("create table hive_carbon(id int, name string, scale decimal, > country string, salary double) STORED BY 'carbondata'") > carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/hadoop/sample.csv' > INTO TABLE hive_carbon") > > > > 5.alter table schema in hive > > > cp ~/spark-2.1/carbon_lib/carbon-assembly-*.jar hive/auxlibs/ > cp spark-catalyst*.jar hive/auxlibs/ > export HIVE_AUX_JARS_PATH=hive/auxlibs/ > > > #start hive cli > ./$HIVE_HOME/bin/hive > > > #execute commands: > alter table hive_carbon set FILEFORMAT > INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat" > OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat" > SERDE "org.apache.carbondata.hive.CarbonHiveSerDe"; > > > alter table hive_carbon set LOCATION 'hdfs://mycluster-tj/user/ > hadoop/carbon/store/default/hive_carbon'; > alter table hive_carbon change col id INT; > alter table hive_carbon add columns(name string, scale decimal, country > string, salary double); > > > > > > 6.check table schema > > > execute "show create table hive_carbon" > > > > > > 7. execute "select * from hive_carbon" and "select * from hive_carbon > order by id" > > > > > > > > > > > > > > > > > > > 8.the table are still available in spark > > > > > > > > > > ------------------ Original ------------------ > From: "Liang Chen";<[hidden email]>; > Date: Thu, Mar 23, 2017 00:09 AM > To: "apache/incubator-carbondata"<[hidden email] > >; > Cc: "Sea"<[hidden email]>; "Mention"<[hidden email]>; > Subject: Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add > hiveintegration for carbon (#672) > > > > > @cenyuhai Thank you contributed this feature. > Suggest creating a new profile for "integration/hive" module, and let > all hive related code decoupled from current modules, let CI run normally > first. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or mute the thread. > -- Thanks and Regards * Anubhav Tarar * * Software Consultant* *Knoldus Software LLP <http://www.knoldus.com/home.knol> * LinkedIn <http://in.linkedin.com/in/rahulforallp> Twitter <https://twitter.com/RahulKu71223673> fb <[hidden email]> mob : 8588915184 |
Free forum by Nabble | Edit this page |