Apache CarbonData Dev Mailing List archive

[DISCUSSION]support new feature: Partition Table

Classic

List

Threaded

3 messages Options

David CaiQiang

[DISCUSSION]support new feature: Partition Table

Hi all,

Let's start the discussion regarding the partition table.

To support partition table, what we should do?

1. create table with partition to support Range Partitioning, Hash Partitioning, List Partitioning and Composite Partitioning, write the partition info to schema.

2. during data loading, re-partition the input data, start a task process a partition, write partition information to footer and index file.

3. during data query, prune B+Tree by partition if the filter contain the partition column. or prune data blocks by partition when there is only partition column predicate.

4. optimizer the join performance of two partition tables if partition column is the join column.

Any thoughts, comments and questions ?

Thanks!

Best Regards
David

Best Regards
David Cai

Re:[DISCUSSION]support new feature: Partition Table

additinal suggestion:
1、support at least two level partition
2、build the B+Tree by partition column shoud split the segment and make it small and may speed load data in carbondata
3、delete data by partition column

best regards
fish

At 2017-03-31 23:42:07, "QiangCai" <[hidden email]> wrote:

>Hi all,
>
> Let's start the discussion regarding the partition table.
>
> To support partition table, what we should do?
>
> 1. create table with partition to support Range Partitioning, Hash
>Partitioning, List Partitioning and Composite Partitioning, write the
>partition info to schema.
>
> 2. during data loading, re-partition the input data, start a task process
>a partition, write partition information to footer and index file.
>
> 3. during data query, prune B+Tree by partition if the filter contain the
>partition column. or prune data blocks by partition when there is only
>partition column predicate.
>
> 4. optimizer the join performance of two partition tables if partition
>column is the join column.
>
> Any thoughts, comments and questions ?
>
> Thanks!
>
>Best Regards
>David
>
>
>
>--
>View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-support-new-feature-Partition-Table-tp9935.html
>Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.

Jacky Li

Re: [DISCUSSION]support new feature: Partition Table

comments inline

> 在 2017年4月1日，下午5:06，a <[hidden email]> 写道：
>
> additinal suggestion:
> 1、support at least two level partition

I think we can let user specify the partition columns, it can be multiple columns together to form a partition key. Is this what you mean by two level partition? Generally speaking, partition on multiple columns usually leads to small file issues, which we may want to avoid.

> 2、build the B+Tree by partition column shoud split the segment and make it small and may speed load data in carbondata

When using partitioning, it will slower down the loading process as it needs shuffle. But benefit is that queries have filter column on partition key will be faster.

> 3、delete data by partition column
>

This could be a future feature in our roadmap after partition feature is supported.

>
>
> best regards
> fish
>
> At 2017-03-31 23:42:07, "QiangCai" <[hidden email]> wrote:
>> Hi all,
>>
>> Let's start the discussion regarding the partition table.
>>
>> To support partition table, what we should do?
>>
>> 1. create table with partition to support Range Partitioning, Hash
>> Partitioning, List Partitioning and Composite Partitioning, write the
>> partition info to schema.
>>
>> 2. during data loading, re-partition the input data, start a task process
>> a partition, write partition information to footer and index file.
>>
>> 3. during data query, prune B+Tree by partition if the filter contain the
>> partition column. or prune data blocks by partition when there is only
>> partition column predicate.
>>
>> 4. optimizer the join performance of two partition tables if partition
>> column is the join column.
>>
>> Any thoughts, comments and questions ?
>>
>> Thanks!
>>
>> Best Regards
>> David
>>
>>
>>
>> --
>> View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-support-new-feature-Partition-Table-tp9935.html
>> Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.