Apache CarbonData Dev Mailing List archive

About hive integration

Classic

List

Threaded

4 messages Options

cenyuhai

Dec 04, 2016; 7:00am

About hive integration

Hi, all:
Now carbondata is not working in hive which is the most widely used query engine. In my company, if I want to use carbon, I need to query carbondata table in hive.
I think we should implement the following features in hive:
1. DDL create/drop/alter carbondata table
2. DML insert(overwrite) /select

What do you think?

ravipesala

Dec 04, 2016; 5:51pm

Re: About hive integration

Hi,

Yes, we have plans for integrating carbondata to hive engine but it is not
our high priority work now so we will take it up this task gradually. Any
contributions towards it are welcome.

Regards,
Ravi

On 4 December 2016 at 12:30, Sea <[hidden email]> wrote:

> Hi, all:
> Now carbondata is not working in hive which is the most widely used
> query engine. In my company, if I want to use carbon, I need to query
> carbondata table in hive.
> I think we should implement the following features in hive:
> 1. DDL create/drop/alter carbondata table
> 2. DML insert(overwrite) /select
>
>
> What do you think?

--
Thanks & Regards,
Ravi

Liang Chen

Dec 09, 2016; 3:56am

Re: About hive integration

Administrator

In reply to this post by cenyuhai

Hi

Agree. Hive has been widely used, this is a consensus。 Apache CarbonData community already have the plan to support hive integration, look forward to seeing your contribution on hive integration also :)

Regards
Liang

cenyuhai wrote

Hi, all:
Now carbondata is not working in hive which is the most widely used query engine. In my company, if I want to use carbon, I need to query carbondata table in hive.
I think we should implement the following features in hive:
1. DDL create/drop/alter carbondata table
2. DML insert(overwrite) /select

What do you think?

cenyuhai

Dec 09, 2016; 8:27am

Re: About hive integration

It looks like that we just need to implement CarbonFileStorageFomartDescriptor and CarbonHiveSerde
CarbonInputformat/CarbonOutputformat already exists in master branch

@Liang, can you create a module for hive?

import java.util.Set;

import org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat;
import org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat;
import org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe;

import com.google.common.collect.ImmutableSet;

public class ParquetFileStorageFormatDescriptor extends AbstractStorageFormatDescriptor {
@Override
public Set<String> getNames() {
return ImmutableSet.of(IOConstants.PARQUETFILE, IOConstants.PARQUET);
}
@Override
public String getInputFormat() {
return MapredParquetInputFormat.class.getName();
}
@Override
public String getOutputFormat() {
return MapredParquetOutputFormat.class.getName();
}
@Override
public String getSerde() {
return ParquetHiveSerDe.class.getName();
}
}

------------------ Original ------------------
From: "Liang Chen";<[hidden email]>;
Date: Fri, Dec 9, 2016 11:56 AM
To: "dev"<[hidden email]>;

Subject: Re: About hive integration

Hi

Agree. Hive has been widely used, this is a consensus。 Apache CarbonData
community already have the plan to support hive integration, look forward to
seeing your contribution on hive integration also :)

Regards
Liang

cenyuhai wrote

--
View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/About-hive-integration-tp3626p3976.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.