[ https://issues.apache.org/jira/browse/CARBONDATA-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] qiuheng closed CARBONDATA-159. ------------------------------ Resolution: Invalid > carbon should support primary key & keep mapping table table_property > --------------------------------------------------------------------- > > Key: CARBONDATA-159 > URL: https://issues.apache.org/jira/browse/CARBONDATA-159 > Project: CarbonData > Issue Type: Improvement > Components: core, data-load, data-query, file-format > Affects Versions: 0.1.0-incubating > Reporter: qiuheng > Labels: features > Fix For: 1.0.0-incubating > > Original Estimate: 720h > Remaining Estimate: 720h > > As we know , carbon support MDK index , according the design ,if we have filter or filter combination on the left side columns , we can get a good performance . > but if the leading key is a high cardinality column (>100million cardinality etc), only the filter on leading key can gain good performance, the filter on following columns and other high cardinality columns can not , because the they are close to un-sort . > i suggest we add one key mapping function , the table property will look like : > create table (low cardinality column to high cardinality column) > table_property( > primary_key h_col3, > index_key_mapping(h_col1,h_col2) > ) > low cardinality-> high cardinality > col1,col2,col3,col4.....col10,h_col1,h_col2,h_col3 > during data loading , carbon will create a internal index table A,it will records all the (values --> position) of primary_key,look like: > h_col3 list of block let > 18682114091 [blockid1+blokletid1],[blockid4+blokletid10].... > 18683343442 [blockid2+blokletid4],[blockid23+blokletid5].... > ... ..... > and will create another two key mapping table: > table 1: > --------------------------------------- > h_col2 hcol3 > jarray 18682114091 > ramana 18683343442 > ...... ....... > table2: > ----------------------------------------- > h_col1 hcol3 > 77647 18682114091 > 99899 18683343442 > ...... ....... > 1)if the filter on col1-col10, will use original MDK capacity ; > 2)if the filter on h_col1, system will scan index table to get the block let position , then use it to fetch the data directly; > 3)if the filter on h_col2 or h_col3 , system first scan the key mapping table to get the primary key list , then 2) -- This message was sent by Atlassian JIRA (v6.3.4#6332) |
Free forum by Nabble | Edit this page |