Apache CarbonData Dev Mailing List archive - Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)

Apache CarbonData Dev Mailing List archive

Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)

Posted by cenyuhai on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Re-apache-incubator-carbondata-CARBONDATA-727-WIP-add-hiveintegration-for-carbon-672-tp9488p9511.html

I forgot something.
Before query data from hive. We should set
set hive.mapred.supports.subdirectories=true;
set mapreduce.input.fileinputformat.input.dir.recursive=true;

------------------ Original ------------------
From: "261810726";<[hidden email]>;
Date: Thu, Mar 23, 2017 09:58 PM
To: "chenliang613"<[hidden email]>; "dev"<[hidden email]>;
Cc: "Mention"<[hidden email]>;
Subject: Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)

Hi, liang:
I create a new profile "integration/hive" and the CI is OK now. But I still have some problems in altering hive metastore schema.
My steps are as following:

1.build carbondata

mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package -Phadoop-2.7.2 -Phive-1.2.1

2.copy jars

mkdir ~/spark-2.1/carbon_lib
cp ~/cenyuhai/incubator-carbondata/assembly/target/scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar ~/spark-2.1/carbon_lib/
cp ~/cenyuhai/incubator-carbondata/integration/hive/target/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar ~/spark-2.1/carbon_lib/

3.create sample.csv and put it into hdfs

id,name,scale,country,salary
1,yuhai,1.77,china,33000.0
2,runlin,1.70,china,32000.0

4.create table in spark

spark-shell --jars "/data/hadoop/spark-2.1/carbon_lib/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar,/data/hadoop/spark-2.1/carbon_lib/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar"

#execute these commands:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._
val rootPath = "hdfs:////user/hadoop/carbon"
val storeLocation = s"$rootPath/store"
val warehouse = s"$rootPath/warehouse"
val metastoredb = s"$rootPath/metastore_db"

val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir", warehouse).config(org.apache.carbondata.core.constants.CarbonCommonConstants.STORE_LOCATION, storeLocation).getOrCreateCarbonSession(storeLocation, metastoredb)

carbon.sql("create table hive_carbon(id int, name string, scale decimal, country string, salary double) STORED BY 'carbondata'")
carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/hadoop/sample.csv' INTO TABLE hive_carbon")

5.alter table schema in hive

cp ~/spark-2.1/carbon_lib/carbon-assembly-*.jar hive/auxlibs/
cp spark-catalyst*.jar hive/auxlibs/
export HIVE_AUX_JARS_PATH=hive/auxlibs/

#start hive cli
./$HIVE_HOME/bin/hive

#execute commands:
alter table hive_carbon set FILEFORMAT
INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";

alter table hive_carbon set LOCATION 'hdfs://mycluster-tj/user/hadoop/carbon/store/default/hive_carbon';
alter table hive_carbon change col id INT;
alter table hive_carbon add columns(name string, scale decimal, country string, salary double);

6.check table schema

execute "show create table hive_carbon"

7. execute "select * from hive_carbon" and "select * from hive_carbon order by id"

8.the table are still available in spark

------------------ Original ------------------
From: "Liang Chen";<[hidden email]>;
Date: Thu, Mar 23, 2017 00:09 AM
To: "apache/incubator-carbondata"<[hidden email]>;
Cc: "Sea"<[hidden email]>; "Mention"<[hidden email]>;
Subject: Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)

@cenyuhai Thank you contributed this feature.
Suggest creating a new profile for "integration/hive" module, and let all hive related code decoupled from current modules, let CI run normally first.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.