Posted by
cenyuhai on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Re-apache-incubator-carbondata-CARBONDATA-727-WIP-add-hiveintegration-for-carbon-672-tp9488p9511.html
I forgot something.
Before query data from hive. We should set
set hive.mapred.supports.subdirectories=true;
set mapreduce.input.fileinputformat.input.dir.recursive=true;
------------------ Original ------------------
From: "261810726";<
[hidden email]>;
Date: Thu, Mar 23, 2017 09:58 PM
To: "chenliang613"<
[hidden email]>; "dev"<
[hidden email]>;
Cc: "Mention"<
[hidden email]>;
Subject: Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)
Hi, liang:
I create a new profile "integration/hive" and the CI is OK now. But I still have some problems in altering hive metastore schema.
My steps are as following:
1.build carbondata
mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package -Phadoop-2.7.2 -Phive-1.2.1
2.copy jars
mkdir ~/spark-2.1/carbon_lib
cp ~/cenyuhai/incubator-carbondata/assembly/target/scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar ~/spark-2.1/carbon_lib/
cp ~/cenyuhai/incubator-carbondata/integration/hive/target/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar ~/spark-2.1/carbon_lib/
3.create sample.csv and put it into hdfs
id,name,scale,country,salary
1,yuhai,1.77,china,33000.0
2,runlin,1.70,china,32000.0
4.create table in spark
spark-shell --jars "/data/hadoop/spark-2.1/carbon_lib/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar,/data/hadoop/spark-2.1/carbon_lib/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar"
#execute these commands:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._
val rootPath = "hdfs:////user/hadoop/carbon"
val storeLocation = s"$rootPath/store"
val warehouse = s"$rootPath/warehouse"
val metastoredb = s"$rootPath/metastore_db"
val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir", warehouse).config(org.apache.carbondata.core.constants.CarbonCommonConstants.STORE_LOCATION, storeLocation).getOrCreateCarbonSession(storeLocation, metastoredb)
carbon.sql("create table hive_carbon(id int, name string, scale decimal, country string, salary double) STORED BY 'carbondata'")
carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/hadoop/sample.csv' INTO TABLE hive_carbon")
5.alter table schema in hive
cp ~/spark-2.1/carbon_lib/carbon-assembly-*.jar hive/auxlibs/
cp spark-catalyst*.jar hive/auxlibs/
export HIVE_AUX_JARS_PATH=hive/auxlibs/
#start hive cli
./$HIVE_HOME/bin/hive
#execute commands:
alter table hive_carbon set FILEFORMAT
INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";
alter table hive_carbon set LOCATION 'hdfs://mycluster-tj/user/hadoop/carbon/store/default/hive_carbon';
alter table hive_carbon change col id INT;
alter table hive_carbon add columns(name string, scale decimal, country string, salary double);
6.check table schema
execute "show create table hive_carbon"
7. execute "select * from hive_carbon" and "select * from hive_carbon order by id"
8.the table are still available in spark
------------------ Original ------------------
From: "Liang Chen";<
[hidden email]>;
Date: Thu, Mar 23, 2017 00:09 AM
To: "apache/incubator-carbondata"<
[hidden email]>;
Cc: "Sea"<
[hidden email]>; "Mention"<
[hidden email]>;
Subject: Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)
@cenyuhai Thank you contributed this feature.
Suggest creating a new profile for "integration/hive" module, and let all hive related code decoupled from current modules, let CI run normally first.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.