Ravindra Pesala created CARBONDATA-466:
------------------------------------------
Summary: Implement bucketing table in carbondata
Key: CARBONDATA-466
URL:
https://issues.apache.org/jira/browse/CARBONDATA-466 Project: CarbonData
Issue Type: New Feature
Reporter: Ravindra Pesala
Bucketing is the useful feature when user wants to join big tables. And also it is useful in driver level partition pruning to improve query performance.
User can add buckets on any dimension column (except complex types) as follows
{code}
CREATE TABLE test(user_id BIGINT, firstname STRING, lastname STRING)
CLUSTERED BY(user_id) INTO 32 BUCKETS
STORED BY 'carbondata';
{code}
In the above example column user_id is hash partitioned and creates 32 bucket files in carbondata. So while doing the join with other table on bucketed column it can select same buckets and do the join with out shuffling.
Carbon format changes
1. Bucketing information needs to be stored inside schema thrift file
2. Bucket id can be stored inside every carbondata index file.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)