[ https://issues.apache.org/jira/browse/CARBONDATA-910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai closed CARBONDATA-910. -------------------------------- Resolution: Invalid deprecated since 2.0 > Implement Partition feature > --------------------------- > > Key: CARBONDATA-910 > URL: https://issues.apache.org/jira/browse/CARBONDATA-910 > Project: CarbonData > Issue Type: New Feature > Components: core, data-load, data-query > Reporter: Cao, Lionel > Assignee: Cao, Lionel > Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Why need partition table > Partition table provide an option to divide table into some smaller pieces. > With partition table: > 1. Data could be better managed, organized and stored. > 2. We can avoid full table scan in some scenario and improve query performance. (partition column in filter, > multiple partition tables join in the same partition column etc.) > Partitioning design > Range Partitioning > range partitioning maps data to partitions according to the range of partition column values, operator '<' defines non-inclusive upper bound of current partition. > List Partitioning > list partitioning allows you map data to partitions with specific value list > Hash Partitioning > hash partitioning maps data to partitions with hash algorithm and put them to the given number of partitions > Composite Partitioning(2 levels at most for now) > Range-Range, Range-List, Range-Hash, List-Range, List-List, List-Hash, Hash-Range, Hash-List, Hash-Hash > DDL-Create > Create table sales( > itemid long, > logdate datetime, > customerid int > ... > ...) > [partition by range logdate(...)] > [subpartition by list area(...)] > Stored By 'carbondata' > [tblproperties(...)]; > range partition: > partition by range logdate(< '2016-01-01', < '2017-01-01', < '2017-02-01', < '2017-03-01', < '2099-01-01') > list partition: > partition by list area('Asia', 'Europe', 'North America', 'Africa', 'Oceania') > hash partition: > partition by hash(itemid, 9) > composite partition: > partition by range logdate(< '2016- -01', < '2017-01-01', < '2017-02-01', < '2017-03-01', < '2099-01-01') > subpartition by list area('Asia', 'Europe', 'North America', 'Africa', 'Oceania') > DDL-Rebuild, Add > Alter table sales rebuild partition by (range|list|hash)(...); > Alter table salse add partition (< '2018-01-01'); #only support range partitioning, list partitioning > Alter table salse add partition ('South America'); > #Note: No delete operation for partition, please use rebuild. > If need delete data, use delete statement, but the definition of partition will not be deleted. > Partition Table Data Store > [Option One] > Use the current design, keep partition folder out of segments > Fact > |___Part0 > | |___Segment_0 > | |___ *******-[bucketId]-.carbondata > | |___ *******-[bucketId]-.carbondata > | |___Segment_1 > | ... > |___Part1 > | |___Segment_0 > | |___Segment_1 > |... > [Option Two] > remove partition folder, add partition id into file name and build btree in driver side. > Fact > |___Segment_0 > | |___ *******-[bucketId]-[partitionId].carbondata > | |___ *******-[bucketId]-[partitionId].carbondata > |___Segment_1 > |___Segment_2 > ... > Pros & Cons: > Option one would be faster to locate target files > Option two need to store more metadata of folders > Partition Table MetaData Store > partitioni info should be stored in file footer/index file and load into memory before user query. > Relationship with Bucket > Bucket should be lower level of partition. > Partition Table Query > Example: > Select * from sales > where logdate <= date '2016-12-01'; > User should remember to add a partition filter when write SQL on a partition table. -- This message was sent by Atlassian Jira (v8.3.4#803005) |
Free forum by Nabble | Edit this page |