Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[jira] [Updated] (CARBONDATA-1377) Implement hive partition

Classic

List

Threaded

1 message

Akash R Nilugal (Jira)

[jira] [Updated] (CARBONDATA-1377) Implement hive partition

[ https://issues.apache.org/jira/browse/CARBONDATA-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Venkata Ramana G updated CARBONDATA-1377:
-----------------------------------------
Description:
Current partition implement is like database, If I want to use carbon to replace parquet massively, we must make the usage of carbon the same with parquet/orc.

Hive users should able to switch to CarbonData for all the new partitions being created. Hive support format to be specified at partition level.
Example:
{code:sql}
create table rtestpartition (col1 string, col2 int) partitioned by (col3 int) stored as parquet;
insert into rtestpartition partition(col3=10) select "pqt", 1;
insert into rtestpartition partition(col3=20) select "pqt", 1;
insert into rtestpartition partition(col3=10) select "pqt", 1;
insert into rtestpartition partition(col3=20) select "pqt", 1;
{code}

{noformat}
hive creates folder like
/db1/table1/col3=10/0001_file.pqt
/db1/table1/col3=10/0002_file.pqt
/db1/table1/col3=20/0001_file.pqt
/db1/table1/col3=20/0002_file.pqt
{noformat}

Hive users can now change new partitions to CarbonData, how ever old partitions still be with parquet and require migration scripts to move to CarbonData format.

{code:sql}
alter table rtestpartition set fileformat carbondata;

insert into rtestpartition partition(col3=30) select "cdata", 1;
insert into rtestpartition partition(col3=40) select "cdata", 1;
{code}

{noformat}
hive creates folder like
/db1/table1/col3=10/0001_file.pqt
/db1/table1/col3=10/0002_file.pqt
/db1/table1/col3=20/0001_file.pqt
/db1/table1/col3=20/0002_file.pqt
/db1/table1/col3=30/<carbondatafiles>
/db1/table1/col3=40/<carbondatafiles>
{noformat}

was:
Current partition implement is like database, If I want to use carbon to replace parquet massively, we must make the usage of carbon the same with parquet/orc.

Hive users should able to switch to CarbonData for all the new partitions being created. Hive support format to be specified at partition level.
Example:
{code:sql}
create table rtestpartition (col1 string, col2 int) partitioned by (col3 int) stored as parquet;
insert into rtestpartition partition(col3=10) select "pqt", 1;
insert into rtestpartition partition(col3=20) select "pqt", 1;
insert into rtestpartition partition(col3=10) select "pqt", 1;
insert into rtestpartition partition(col3=20) select "pqt", 1;
{code}

{noformat}
hive creates folder like
/db1/table1/col3=10/0001_file.pqt
0002_file.pqt
/db1/table1/col3=20/0001_file.pqt
0002_file.pqt
{noformat}

Hive users can now change new partitions to CarbonData, how ever old partitions still be with parquet and require migration scripts to move to CarbonData format.

alter table rtestpartition set fileformat carbondata;

insert into rtestpartition partition(col3=30) select "cdata", 1;
insert into rtestpartition partition(col3=30) select "cdata", 1;
insert into rtestpartition partition(col3=40) select "cdata", 1;
insert into rtestpartition partition(col3=40) select "cdata", 1;

> Implement hive partition
> ------------------------
>
> Key: CARBONDATA-1377
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1377
> Project: CarbonData
> Issue Type: Sub-task
> Components: hive-integration
> Reporter: cen yuhai
>
> Current partition implement is like database, If I want to use carbon to replace parquet massively, we must make the usage of carbon the same with parquet/orc.
> Hive users should able to switch to CarbonData for all the new partitions being created. Hive support format to be specified at partition level.
> Example:
> {code:sql}
> create table rtestpartition (col1 string, col2 int) partitioned by (col3 int) stored as parquet;
> insert into rtestpartition partition(col3=10) select "pqt", 1;
> insert into rtestpartition partition(col3=20) select "pqt", 1;
> insert into rtestpartition partition(col3=10) select "pqt", 1;
> insert into rtestpartition partition(col3=20) select "pqt", 1;
> {code}
> {noformat}
> hive creates folder like
> /db1/table1/col3=10/0001_file.pqt
> /db1/table1/col3=10/0002_file.pqt
> /db1/table1/col3=20/0001_file.pqt
> /db1/table1/col3=20/0002_file.pqt
> {noformat}
> Hive users can now change new partitions to CarbonData, how ever old partitions still be with parquet and require migration scripts to move to CarbonData format.
> {code:sql}
> alter table rtestpartition set fileformat carbondata;
> insert into rtestpartition partition(col3=30) select "cdata", 1;
> insert into rtestpartition partition(col3=40) select "cdata", 1;
> {code}
> {noformat}
> hive creates folder like
> /db1/table1/col3=10/0001_file.pqt
> /db1/table1/col3=10/0002_file.pqt
> /db1/table1/col3=20/0001_file.pqt
> /db1/table1/col3=20/0002_file.pqt
> /db1/table1/col3=30/<carbondatafiles>
> /db1/table1/col3=40/<carbondatafiles>
> {noformat}

--
This message was sent by Atlassian JIRA
(v6.4.14#64029)