[jira] [Updated] (CARBONDATA-1377) Implement hive partition

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (CARBONDATA-1377) Implement hive partition

Akash R Nilugal (Jira)

     [ https://issues.apache.org/jira/browse/CARBONDATA-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Venkata Ramana G updated CARBONDATA-1377:
-----------------------------------------
    Description:
Current partition implement is like database, If I want to use carbon to replace parquet massively, we must make the usage of carbon the same with parquet/orc.

Hive users should able to switch to CarbonData for all the new partitions being created. Hive support format to be specified at partition level.
Example:
{code:sql}
create table rtestpartition (col1 string, col2 int) partitioned by (col3 int) stored as parquet;
insert into rtestpartition partition(col3=10) select "pqt", 1;
insert into rtestpartition partition(col3=20) select "pqt", 1;
insert into rtestpartition partition(col3=10) select "pqt", 1;
insert into rtestpartition partition(col3=20) select "pqt", 1;
{code}

{noformat}
hive creates folder like
    /db1/table1/col3=10/0001_file.pqt
    /db1/table1/col3=10/0002_file.pqt
    /db1/table1/col3=20/0001_file.pqt
    /db1/table1/col3=20/0002_file.pqt
{noformat}

Hive users can now change new partitions to CarbonData, how ever old partitions still be with parquet and require migration scripts to move to CarbonData format.

{code:sql}
alter table rtestpartition set fileformat carbondata;

insert into rtestpartition partition(col3=30) select "cdata", 1;
insert into rtestpartition partition(col3=40) select "cdata", 1;
{code}

{noformat}
hive creates folder like
    /db1/table1/col3=10/0001_file.pqt
    /db1/table1/col3=10/0002_file.pqt
    /db1/table1/col3=20/0001_file.pqt
    /db1/table1/col3=20/0002_file.pqt
    /db1/table1/col3=30/<carbondatafiles>
    /db1/table1/col3=40/<carbondatafiles>
{noformat}


  was:
Current partition implement is like database, If I want to use carbon to replace parquet massively, we must make the usage of carbon the same with parquet/orc.

Hive users should able to switch to CarbonData for all the new partitions being created. Hive support format to be specified at partition level.
Example:
{code:sql}
create table rtestpartition (col1 string, col2 int) partitioned by (col3 int) stored as parquet;
insert into rtestpartition partition(col3=10) select "pqt", 1;
insert into rtestpartition partition(col3=20) select "pqt", 1;
insert into rtestpartition partition(col3=10) select "pqt", 1;
insert into rtestpartition partition(col3=20) select "pqt", 1;
{code}

{noformat}
hive creates folder like
    /db1/table1/col3=10/0001_file.pqt
                                     0002_file.pqt
    /db1/table1/col3=20/0001_file.pqt
                                     0002_file.pqt
{noformat}

Hive users can now change new partitions to CarbonData, how ever old partitions still be with parquet and require migration scripts to move to CarbonData format.

alter table rtestpartition set fileformat carbondata;

insert into rtestpartition partition(col3=30) select "cdata", 1;
insert into rtestpartition partition(col3=30) select "cdata", 1;
insert into rtestpartition partition(col3=40) select "cdata", 1;
insert into rtestpartition partition(col3=40) select "cdata", 1;



> Implement hive partition
> ------------------------
>
>                 Key: CARBONDATA-1377
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-1377
>             Project: CarbonData
>          Issue Type: Sub-task
>          Components: hive-integration
>            Reporter: cen yuhai
>
> Current partition implement is like database, If I want to use carbon to replace parquet massively, we must make the usage of carbon the same with parquet/orc.
> Hive users should able to switch to CarbonData for all the new partitions being created. Hive support format to be specified at partition level.
> Example:
> {code:sql}
> create table rtestpartition (col1 string, col2 int) partitioned by (col3 int) stored as parquet;
> insert into rtestpartition partition(col3=10) select "pqt", 1;
> insert into rtestpartition partition(col3=20) select "pqt", 1;
> insert into rtestpartition partition(col3=10) select "pqt", 1;
> insert into rtestpartition partition(col3=20) select "pqt", 1;
> {code}
> {noformat}
> hive creates folder like
>     /db1/table1/col3=10/0001_file.pqt
>     /db1/table1/col3=10/0002_file.pqt
>     /db1/table1/col3=20/0001_file.pqt
>     /db1/table1/col3=20/0002_file.pqt
> {noformat}
> Hive users can now change new partitions to CarbonData, how ever old partitions still be with parquet and require migration scripts to move to CarbonData format.
> {code:sql}
> alter table rtestpartition set fileformat carbondata;
> insert into rtestpartition partition(col3=30) select "cdata", 1;
> insert into rtestpartition partition(col3=40) select "cdata", 1;
> {code}
> {noformat}
> hive creates folder like
>     /db1/table1/col3=10/0001_file.pqt
>     /db1/table1/col3=10/0002_file.pqt
>     /db1/table1/col3=20/0001_file.pqt
>     /db1/table1/col3=20/0002_file.pqt
>     /db1/table1/col3=30/<carbondatafiles>
>     /db1/table1/col3=40/<carbondatafiles>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)