Apache CarbonData Dev Mailing List archive

Carbondata integration Plan

Classic

List

Threaded

4 messages Options

cenyuhai

Carbondata integration Plan

Hi，all:
For the first phase, I think support reading carbon tables in hive is enough.
We still have something to do
1、Make carbon schema compatible with hive（CARBONDATA-1008）(create table and alter table)
2、Filter pushdown （especially partition filter FilterPushdownDev）
3、A tool to update the existing tables' schema to be compatible with hive.

Do you have any idea?

Liang Chen

Re: Carbondata hive integration Plan

Administrator

This post was updated on .

Hi cenyuhai

Thanks for you started this discussion about hive integration:

1、Make carbon schema compatible with hive（CARBONDATA-1008）(create table and alter table)

Liang: Like you mentioned, for first phase(1.2.0), supports read carbondata files in hive. so can i understand the flow should be liking this : a)all steps of preparing carbondata files be handled in Spark, so "create table and alter table" would be handled in Spark.) In hive, only read(query). So can you explain a little more about what you mentioned that schema compatible with hive is for what part ?

2、Filter pushdown （especially partition filter FilterPushdownDev）
Liang : LGTM for this point.

3、A tool to update the existing tables' schema to be compatible with hive.
Liang : comment same as question1. can you give some examples for "the existing tables' schema".

For hive integration feature in Apache CarbonData 1.2.0, i propose the scope as below:
1.Only support read/query carbondata files in hive. write carbondata(create carbon table,alter carbon table, load data etc.) in hive will be supported in future.(new mailing list topic can be discussed for this plan)
2.Can utilize CarbonData's good features(index, dictionary...... ) to get good query performance. Hive+CarbonData performance should be better than Hive+ORC
3.Provide a solution/tool to migrate all hive tables&data to carbon tables&data in Spark.

Regards
Liang

anubhavtarar

Re: Carbondata hive integration Plan

Hi cenyuhai,can you tell why tool will be required,you already a pr
for it Making
carbon schema compatible with hive（CARBONDATA-1008）what will this tool do?

@liang HI,By existing tables' schema".cenyuhai means that when you are
reading carbondata table from hive you need to alter schema of that
carbontable to use mapredcarboninputformat and mapredcarbonoutput format
which are compatible with hive using following steps

alter table CHARTYPES31 set FILEFORMAT
INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";

alter table CHARTYPES3 set LOCATION
'hdfs://localhost:54310/opt/carbonStore/default/CHARTYPES3' ;

On Fri, Jun 2, 2017 at 5:41 AM, Liang Chen <[hidden email]> wrote:

> Hi cenyuhai
>
> Thanks for you started this discussion about hive integration:
>
> 1、Make carbon schema compatible with hive（CARBONDATA-1008）(create table
> and alter table)
>
> Liang: Like you mentioned, for first phase(1.2.0), supports read
> carbondata files in hive. so can i understand the flow should be liking
> this
> : a)all steps of preparing carbondata files be handled in Spark, so "create
> table and alter table" would be handled in Spark.) In hive, only
> read(query). So can you explain at little more about what you mentioned
> that schema compatible with hive is for what part ?
>
> 2、Filter pushdown （especially partition filter FilterPushdownDev）
> Liang : LGTM for this point.
>
> 3、A tool to update the existing tables' schema to be compatible with
> hive.
> Liang : comment same as question1. can you give some examples for
> "the
> existing tables' schema".
>
>
> For hive integration feature in Apache CarbonData 1.2.0, i propose the
> scope
> as below:
> 1.Only support read/query carbondata files in hive. write
> carbondata(create
> carbon table,alter carbon table, load data etc.) in hive will be supported
> in future.(new mailing topic can discuss the plan)
> 2.Can utilize CarbonData's good features(index, dictionary...... ) to get
> good query performance. Hive+CarbonData performance should be better than
> Hive+ORC
> 3.Provide a solution/tool to migrate all hive tables&data to carbon
> tables&data in Spark.
>
> Regards
> Liang
>
>
>
>
>
>
>
> --
> View this message in context: http://apache-carbondata-dev-
> mailing-list-archive.1130556.n5.nabble.com/Carbondata-integration-Plan-
> tp13450p13647.html
> Sent from the Apache CarbonData Dev Mailing List archive mailing list
> archive at Nabble.com.
>

--
Thanks and Regards

* Anubhav Tarar *

* Software Consultant*
*Knoldus Software LLP <http://www.knoldus.com/home.knol> *
LinkedIn <http://in.linkedin.com/in/rahulforallp> Twitter
<https://twitter.com/RahulKu71223673> fb <[hidden email]>
mob : 8588915184

cenyuhai

Re: Carbondata hive integration Plan

hi, anubhav:
you are right, this tool is unnecessary.

------------------ Original ------------------
From: "anubhav.tarar";<[hidden email]>;
Date: Wed, Jun 7, 2017 06:44 PM
To: "dev"<[hidden email]>;

Subject: Re: Carbondata hive integration Plan

Hi cenyuhai,can you tell why tool will be required,you already a pr
for it Making
carbon schema compatible with hive（CARBONDATA-1008）what will this tool do?

@liang HI,By existing tables' schema".cenyuhai means that when you are
reading carbondata table from hive you need to alter schema of that
carbontable to use mapredcarboninputformat and mapredcarbonoutput format
which are compatible with hive using following steps

alter table CHARTYPES31 set FILEFORMAT
INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";

alter table CHARTYPES3 set LOCATION
'hdfs://localhost:54310/opt/carbonStore/default/CHARTYPES3' ;

On Fri, Jun 2, 2017 at 5:41 AM, Liang Chen <[hidden email]> wrote: