http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSS-For-the-dimension-default-should-be-no-dictionary-tp8010p8111.html
dictionary as default. We have initially introduced no dictionary columns
for better compression and better filter queries as well. With out
> Hi
>
> A couple of questions:
>
> 1) For SORT_KEY option: only build "MDK index, inverted index, minmax
> index" for these columns which be specified into the option(SORT_KEY) ?
>
> 2) If users don't specify TABLE_DICTIONARY, then all columns don't make
> dictionary encoding, and all shuffle operations are based on fact value, is
> my understanding right ?
> ------------------------------------------------------------
> -------------------------------------------
> If this option is not specified by user, means all columns encoding without
> global dictionary support. Normal shuffle on decoded value will be applied
> when doing group by operation.
>
> 3) After introducing the two options "SORT_KEY and TABLE_DICTIONARY",
> supposed if "C2" be specified into SORT_KEY, but not be specified into
> TABLE_DICTIONARY, then system how to handle this case ?
> ------------------------------------------------------------
> -----------------------------------------------
> For example, SORT_COLUMNS=“C1,C2,C3”, means C1,C2,C3 is MDK and encoded as
> Inverted Index and with Minmax Index
>
> Regards
> Liang
>
> 2017-02-28 19:35 GMT+08:00 Jacky Li <
[hidden email]>:
>
> > Yes, first we should simplify the DDL options. I propose following
> options,
> > please check weather it miss some scenario.
> >
> > 1. SORT_COLUMNS, or SORT_KEY
> > This indicates three things:
> > 1) All columns specified in options will be used to construct
> > Multi-Dimensional Key, which will be sorted along this key
> > 2) They will be encoded as Inverted Index and thus again sorted within
> > column chunk in one blocklet
> > 3) Minmax index will also be created for these columns
> >
> > When to use: This option is designed for accelerating filter query, so
> put
> > all filter columns into this option. The order of it can be:
> > 1) From low cardinality to high cardinality, this will make most
> > compression
> > and fit for scenario that does not have frequent filter on high card
> column
> > 2) Put high cardinality column first, then put others. This fits for
> > frequent filter on high card column
> >
> > For example, SORT_COLUMNS=“C1,C2,C3”, means C1,C2,C3 is MDK and encoded
> as
> > Inverted Index and with Minmax Index
> > Note that while C1,C2,C3 can be dimension but they also can be measure.
> So
> > if user need to filter on measure column, it can be put in SORT_COLUMNS
> > option.
> >
> > If this option is not specified by user, carbon will pick MDK as it is
> now.
> >
> > 2. TABLE_DICTIONARY
> > This is to specify the table level dictionary columns. Will create global
> > dictionary for all columns in this option for every data load.
> >
> > When to use: The option is designed for accelerating aggregate query, so
> > put
> > group by columns into this option
> >
> > For example. TABLE_DICTIONARY=“C2,C3,C5”
> >
> > If this option is not specified by user, means all columns encoding
> without
> > global dictionary support. Normal shuffle on decoded value will be
> applied
> > when doing group by operation.
> >
> > I think these two options should be the basic option for normal user, the
> > goal of them is to satisfy the most scenario without deep tuning of the
> > table
> > For advanced user who want to do deep tuning, we can debate to add more
> > options. But we need to identify what scenario is not satisfied by using
> > these two options first.
> >
> > Regards,
> > Jacky
> >
> >
> >
> > --
> > View this message in context:
http://apache-carbondata-> > mailing-list-archive.1130556.n5.nabble.com/DISCUSS-For-the-
> > dimension-default-should-be-no-dictionary-tp8010p8081.html
> > Sent from the Apache CarbonData Mailing List archive mailing list archive
> > at Nabble.com.
> >
>
>
>
> --
> Regards
> Liang
>