[
https://issues.apache.org/jira/browse/CARBONDATA-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ravindra Pesala reassigned CARBONDATA-466:
------------------------------------------
Assignee: Ravindra Pesala
> Implement bucketing table in carbondata
> ---------------------------------------
>
> Key: CARBONDATA-466
> URL:
https://issues.apache.org/jira/browse/CARBONDATA-466> Project: CarbonData
> Issue Type: New Feature
> Reporter: Ravindra Pesala
> Assignee: Ravindra Pesala
>
> Bucketing is the useful feature when user wants to join big tables. And also it is useful in driver level partition pruning to improve query performance.
> User can add buckets on any dimension column (except complex types) as follows
> {code}
> CREATE TABLE test(user_id BIGINT, firstname STRING, lastname STRING)
> CLUSTERED BY(user_id) INTO 32 BUCKETS
> STORED BY 'carbondata';
> {code}
> In the above example column user_id is hash partitioned and creates 32 bucket files in carbondata. So while doing the join with other table on bucketed column it can select same buckets and do the join with out shuffling.
> Carbon format changes
> 1. Bucketing information needs to be stored inside schema thrift file
> 2. Bucket id can be stored inside every carbondata index file.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)