Login  Register

Re: [DISCUSS] For the dimension default should be no dictionary

Posted by bill.zhou on Feb 27, 2017; 7:08am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSS-For-the-dimension-default-should-be-no-dictionary-tp8010p8027.html

Dear Vishal & Ravindra
 
  Thanks for you reply,  I think I didn't describe it clearly so that you don't get full idea.
1. dictionary is important feature in CarbonData, for every new customer we will introduce this feature to him. So for new customer will know it clearly, will set the dictionary column when create table.
2. For all customer like bank customer, telecom customer and traffic customer have a same scenario is: have more column but only set few column as dictionary.
    like telecom customer, 300 column only set 5 column dictionary, other dim don't set dictionary.
    like bank customer, 100 column only set about 5 column dictionary, other dim don't set dictionary.
For currently customer actually user scenario, they only set the dim which used for filter and group by related column as dictionary
3. mys suggestion is that: dim column default as no dictionary is only for the dim which not put into the dictionary_include properties, not for all dim column. If customer always used 5 columns add into dictionary_include and others column no dictionary, this will not impact the query performance.

So that I suggestion the dim column default set as no dictionary which not added in to dictionary_include properties.

Regards
Bill


kumarvishal09 wrote
Hi,
    I completely agree with Ravindra's points, more number of no dictionary
column will impact the IO reading+writing both as in case of no dictionary
data size will increase. Late decoding is one of main advantage, no
dictionary column aggregation will be slower. Filter query will suffer as
in case of dictionary column we are comparing on byte pack value, in case
of no dictionary it will be on actual value.

-Regards
Kumar Vishal

On Mon, Feb 27, 2017 at 12:34 AM, Ravindra Pesala <[hidden email]>
wrote:

> Hi,
>
> I feel there are more disadvantages than advantages in this approach. In
> your current scenario you want to set dictionary only for columns which are
> used as filters, but the usage of dictionary is not only limited for
> filters, it can reduce the store size and improve the aggregation queries.
> I think you should set no_inverted_index false on non filtered columns to
> reduce the store size and improve the performance.
>
> If we make no dictionary as default then user no need set them in DDL but
> user needs to set the dictionary columns. If user wants to set more
> dictionary columns then the same problem what you mentioned arises again so
> it does not solve the problem. I feel we should give more flexibility in
> our DDL to simplify these scenarios and we should have more discussion on
> it.
>
> Pros & Cons of your suggestion.
> Advantages :
> 1. Decoding/Encoding of dictionary could be avoided.
>
> Disadvantages :
> 1. Store size will increase drastically.
> 2. IO will increase so query performance will come down.
> 3. Aggregation queries performance will suffer.
>
>
>
> Regards,
> Ravindra.
>
> On 26 February 2017 at 20:04, bill.zhou <[hidden email]> wrote:
>
> > hi All
> >     Now when create the CarbonData table,if  the dimension don't add into
> > the dictionary_exclude properties, the dimension will be consider as
> > dictionary default. I think default should be no dictionary.
> >
> >     For example when I do the POC for one customer, it has 300 columns
> and
> > 200 dimensions, but only 5 columns is used for filter, so he only need
> set
> > this 5 columns to dictionary and leave other 195 columns to no
> dictionary.
> > But now he need specify for the 195 columns to dictionary_exclude
> > properties
> > the will waste time and make the create table command huge, also will
> > impact
> > the load performance.
> >
> >     So I suggestion dimension default should be no dictionary and this
> can
> > also help customer easy to know the dictionary column which is useful.
> >
> >
> >
> > --
> > View this message in context: http://apache-carbondata-
> > mailing-list-archive.1130556.n5.nabble.com/DISCUSS-For-the-
> > dimension-default-should-be-no-dictionary-tp8010.html
> > Sent from the Apache CarbonData Mailing List archive mailing list archive
> > at Nabble.com.
> >
>
>
>
> --
> Thanks & Regards,
> Ravi
>