Re: [New Feature] Adding bucketed table feature to Carbondata
Posted by
sraghunandan on
Nov 27, 2016; 5:54pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/New-Feature-Adding-bucketed-table-feature-to-Carbondata-tp3253p3254.html
How is this different from partitioning?
On Sun, 27 Nov 2016 at 11:21 PM, Ravindra Pesala <
[hidden email]>
wrote:
> Hi All,
>
> Bucketing concept is based on the hash partition the bucketed column as per
> configured bucket numbers. Records with same bucketed column always goes to
> the same same bucket. Physically each bucket is a file/files in table
> directory.
> Advantages
> Bucketed table is useful feature to do the map side joins and avoids
> shuffling of data.
> Carbondata can do driver level pruning on bucketed column to improve query
> performance.
>
> User can add bucketed table as follows
>
> CREATE TABLE test(user_id BIGINT, firstname STRING, lastname STRING)
> CLUSTERED BY(user_id) INTO 32 BUCKETS;
>
> In the above example column user_id is hash partitioned and creates 32
> buckets/partitions files in carbondata. So while doing the join with other
> table on bucketed column it can select same buckets and do the join with
> out shuffling.
>
> Carbon creates following folder structure currently, since carbon is
> already supporting partitioning in its file format
>
> dbName -> tableName - > Fact ->
>
> Part0 ->Segment_id ->
> carbondatafiles
>
> Part1 ->Segment_id ->
> carbondatafiles
>
> we can also move the partitionid to file metadata.But if we move the
> partitionId to metadata then there would be complications in backward
> compatibility.
> --
> Thanks & Regards,
> Ravindra
>