[DISCUSSION]support new feature: Partition Table

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSSION]support new feature: Partition Table

David CaiQiang
Hi all,

  Let's start the discussion regarding the partition table.

  To support partition table, what we should do?

  1. create table with partition to support Range Partitioning, Hash Partitioning, List Partitioning and Composite Partitioning, write the partition info to schema.

  2. during data loading, re-partition the input data, start a task process a partition, write partition information to footer and index file.

  3. during data query, prune B+Tree by partition if the filter contain the partition column. or prune data blocks by partition when there is only partition column predicate.

  4. optimizer the join performance of two partition tables if partition column is the join column.

   Any thoughts, comments and questions ?

   Thanks!

Best Regards
David
Best Regards
David Cai
a
Reply | Threaded
Open this post in threaded view
|

Re:[DISCUSSION]support new feature: Partition Table

a
additinal suggestion:
1、support at least two level partition
2、build the B+Tree by partition column shoud split the segment and make it small and may speed load data in carbondata
3、delete data by partition column



best regards
fish

At 2017-03-31 23:42:07, "QiangCai" <[hidden email]> wrote:

>Hi all,
>
>  Let's start the discussion regarding the partition table.
>
>  To support partition table, what we should do?
>
>  1. create table with partition to support Range Partitioning, Hash
>Partitioning, List Partitioning and Composite Partitioning, write the
>partition info to schema.
>
>  2. during data loading, re-partition the input data, start a task process
>a partition, write partition information to footer and index file.
>
>  3. during data query, prune B+Tree by partition if the filter contain the
>partition column. or prune data blocks by partition when there is only
>partition column predicate.
>
>  4. optimizer the join performance of two partition tables if partition
>column is the join column.
>
>   Any thoughts, comments and questions ?
>
>   Thanks!
>
>Best Regards
>David
>
>
>
>--
>View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-support-new-feature-Partition-Table-tp9935.html
>Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION]support new feature: Partition Table

Jacky Li
comments inline

> 在 2017年4月1日,下午5:06,a <[hidden email]> 写道:
>
> additinal suggestion:
> 1、support at least two level partition

I think we can let user specify the partition columns, it can be multiple columns together to form a partition key. Is this what you mean by two level partition? Generally speaking, partition on multiple columns usually leads to small file issues, which we may want to avoid.

> 2、build the B+Tree by partition column shoud split the segment and make it small and may speed load data in carbondata

When using partitioning, it will slower down the loading process as it needs shuffle. But benefit is that queries have filter column on partition key will be faster.

> 3、delete data by partition column
>

This could be a future feature in our roadmap after partition feature is supported.

>
>
> best regards
> fish
>
> At 2017-03-31 23:42:07, "QiangCai" <[hidden email]> wrote:
>> Hi all,
>>
>> Let's start the discussion regarding the partition table.
>>
>> To support partition table, what we should do?
>>
>> 1. create table with partition to support Range Partitioning, Hash
>> Partitioning, List Partitioning and Composite Partitioning, write the
>> partition info to schema.
>>
>> 2. during data loading, re-partition the input data, start a task process
>> a partition, write partition information to footer and index file.
>>
>> 3. during data query, prune B+Tree by partition if the filter contain the
>> partition column. or prune data blocks by partition when there is only
>> partition column predicate.
>>
>> 4. optimizer the join performance of two partition tables if partition
>> column is the join column.
>>
>>  Any thoughts, comments and questions ?
>>
>>  Thanks!
>>
>> Best Regards
>> David
>>
>>
>>
>> --
>> View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-support-new-feature-Partition-Table-tp9935.html
>> Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.