http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSS-For-the-dimension-default-should-be-no-dictionary-tp8010p8043.html
not be perfect solution. Basically no-dictionary columns are only meant for
In above case C1, C2 , C3 are sort columns and part of MDK key. And
scenarios. We can have more discussion towards it to simplify the DDL.
> Dear Vishal & Ravindra
>
> Thanks for you reply, I think I didn't describe it clearly so that you
> don't get full idea.
> 1. dictionary is important feature in CarbonData, for every new customer we
> will introduce this feature to him. So for new customer will know it
> clearly, will set the dictionary column when create table.
> 2. For all customer like bank customer, telecom customer and traffic
> customer have a same scenario is: have more column but only set few column
> as dictionary.
> like telecom customer, 300 column only set 5 column dictionary, other
> dim don't set dictionary.
> like bank customer, 100 column only set about 5 column dictionary,
> other
> dim don't set dictionary.
> *For currently customer actually user scenario, they only set the dim which
> used for filter and group by related column as dictionary*
> 3. mys suggestion is that: dim column default as no dictionary is only for
> the dim which not put into the dictionary_include properties, not for all
> dim column. If customer always used 5 columns add into dictionary_include
> and others column no dictionary, this will not impact the query
> performance.
>
> So that I suggestion the dim column default set as no dictionary which not
> added in to dictionary_include properties.
>
> Regards
> Bill
>
>
>
> kumarvishal09 wrote
> > Hi,
> > I completely agree with Ravindra's points, more number of no
> > dictionary
> > column will impact the IO reading+writing both as in case of no
> dictionary
> > data size will increase. Late decoding is one of main advantage, no
> > dictionary column aggregation will be slower. Filter query will suffer as
> > in case of dictionary column we are comparing on byte pack value, in case
> > of no dictionary it will be on actual value.
> >
> > -Regards
> > Kumar Vishal
> >
> > On Mon, Feb 27, 2017 at 12:34 AM, Ravindra Pesala <
>
> > ravi.pesala@
>
> > >
> > wrote:
> >
> >> Hi,
> >>
> >> I feel there are more disadvantages than advantages in this approach. In
> >> your current scenario you want to set dictionary only for columns which
> >> are
> >> used as filters, but the usage of dictionary is not only limited for
> >> filters, it can reduce the store size and improve the aggregation
> >> queries.
> >> I think you should set no_inverted_index false on non filtered columns
> to
> >> reduce the store size and improve the performance.
> >>
> >> If we make no dictionary as default then user no need set them in DDL
> but
> >> user needs to set the dictionary columns. If user wants to set more
> >> dictionary columns then the same problem what you mentioned arises again
> >> so
> >> it does not solve the problem. I feel we should give more flexibility in
> >> our DDL to simplify these scenarios and we should have more discussion
> on
> >> it.
> >>
> >> Pros & Cons of your suggestion.
> >> Advantages :
> >> 1. Decoding/Encoding of dictionary could be avoided.
> >>
> >> Disadvantages :
> >> 1. Store size will increase drastically.
> >> 2. IO will increase so query performance will come down.
> >> 3. Aggregation queries performance will suffer.
> >>
> >>
> >>
> >> Regards,
> >> Ravindra.
> >>
> >> On 26 February 2017 at 20:04, bill.zhou <
>
> > zgcsky08@
>
> > > wrote:
> >>
> >> > hi All
> >> > Now when create the CarbonData table,if the dimension don't add
> >> into
> >> > the dictionary_exclude properties, the dimension will be consider as
> >> > dictionary default. I think default should be no dictionary.
> >> >
> >> > For example when I do the POC for one customer, it has 300 columns
> >> and
> >> > 200 dimensions, but only 5 columns is used for filter, so he only need
> >> set
> >> > this 5 columns to dictionary and leave other 195 columns to no
> >> dictionary.
> >> > But now he need specify for the 195 columns to dictionary_exclude
> >> > properties
> >> > the will waste time and make the create table command huge, also will
> >> > impact
> >> > the load performance.
> >> >
> >> > So I suggestion dimension default should be no dictionary and this
> >> can
> >> > also help customer easy to know the dictionary column which is useful.
> >> >
> >> >
> >> >
> >> > --
> >> > View this message in context:
http://apache-carbondata-> >> > mailing-list-archive.1130556.n5.nabble.com/DISCUSS-For-the-
> >> > dimension-default-should-be-no-dictionary-tp8010.html
> >> > Sent from the Apache CarbonData Mailing List archive mailing list
> >> archive
> >> > at Nabble.com.
> >> >
> >>
> >>
> >>
> >> --
> >> Thanks & Regards,
> >> Ravi
> >>
>
>
>
>
>
> --
> View this message in context:
http://apache-carbondata-> mailing-list-archive.1130556.n5.nabble.com/DISCUSS-For-the-
> dimension-default-should-be-no-dictionary-tp8010p8027.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>