http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSS-For-the-dimension-default-should-be-no-dictionary-tp8010p8096.html
-------------------------------------------------------------------------------------------------------
global dictionary support. Normal shuffle on decoded value will be applied
when doing group by operation.
-----------------------------------------------------------------------------------------------------------
> Yes, first we should simplify the DDL options. I propose following options,
> please check weather it miss some scenario.
>
> 1. SORT_COLUMNS, or SORT_KEY
> This indicates three things:
> 1) All columns specified in options will be used to construct
> Multi-Dimensional Key, which will be sorted along this key
> 2) They will be encoded as Inverted Index and thus again sorted within
> column chunk in one blocklet
> 3) Minmax index will also be created for these columns
>
> When to use: This option is designed for accelerating filter query, so put
> all filter columns into this option. The order of it can be:
> 1) From low cardinality to high cardinality, this will make most
> compression
> and fit for scenario that does not have frequent filter on high card column
> 2) Put high cardinality column first, then put others. This fits for
> frequent filter on high card column
>
> For example, SORT_COLUMNS=“C1,C2,C3”, means C1,C2,C3 is MDK and encoded as
> Inverted Index and with Minmax Index
> Note that while C1,C2,C3 can be dimension but they also can be measure. So
> if user need to filter on measure column, it can be put in SORT_COLUMNS
> option.
>
> If this option is not specified by user, carbon will pick MDK as it is now.
>
> 2. TABLE_DICTIONARY
> This is to specify the table level dictionary columns. Will create global
> dictionary for all columns in this option for every data load.
>
> When to use: The option is designed for accelerating aggregate query, so
> put
> group by columns into this option
>
> For example. TABLE_DICTIONARY=“C2,C3,C5”
>
> If this option is not specified by user, means all columns encoding without
> global dictionary support. Normal shuffle on decoded value will be applied
> when doing group by operation.
>
> I think these two options should be the basic option for normal user, the
> goal of them is to satisfy the most scenario without deep tuning of the
> table
> For advanced user who want to do deep tuning, we can debate to add more
> options. But we need to identify what scenario is not satisfied by using
> these two options first.
>
> Regards,
> Jacky
>
>
>
> --
> View this message in context:
http://apache-carbondata-> mailing-list-archive.1130556.n5.nabble.com/DISCUSS-For-the-
> dimension-default-should-be-no-dictionary-tp8010p8081.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>