http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Carbon-Local-Dictionary-Support-tp51447p51542.html
Please find the link for design doc.
> Hi Community,
>
> Please find the Attached Local dictionary support design document. Please
> let me know for any further clarification on design document.
> Any further inputs/improvements are most welcomed.
>
>
>
> -Regards
> Kumar Vishal
>
> On Tue, Jun 5, 2018 at 6:14 PM, Jacky Li <
[hidden email]> wrote:
>
>> +1
>> Good feature to add in CarbonData
>>
>> Regards,
>> Jacky
>>
>>
>> > 在 2018年6月4日,下午11:10,Kumar Vishal <
[hidden email]> 写道:
>> >
>> > Hi Community,Currently CarbonData supports global dictionary or
>> > No-Dictionary (Plain-Text stored in LV format) for storing dimension
>> column
>> > data.
>> >
>> > *Bottleneck with Global Dictionary*
>> >
>> > 1.
>> >
>> > As dictionary file is mutable file, so it is not possible to support
>> > global dictionary in storage environment which does not support
>> append.
>> > 2.
>> >
>> > It’s difficult for user to determine whether the column should be
>> > dictionary or not if number of columns in table is high.
>> > 3.
>> >
>> > Global dictionary generation generally slows down the load process
>> >
>> > *Bottleneck with No-Dictionary*
>> >
>> > 1.
>> >
>> > Storage size is high
>> > 2.
>> >
>> > Query on No-Dictionary column is slower as data read/processed is more
>> > 3.
>> >
>> > Filtering is slower on No-Dictionary columns as number of comparison
>> is
>> > high
>> > 4.
>> >
>> > Memory footprint is high
>> >
>> > The above bottlenecks can be solved by *Generating Local dictionary for
>> low
>> > cardinality columns at each blocklet level, *which will help to achieve
>> > below benefits:
>> >
>> > 1.
>> >
>> > This will help in supporting dictionary generation on different
>> storage
>> > environment irrespective of its supported operations(append) on the
>> files.
>> > 2.
>> >
>> > Reduces the extra IO operations read/write on the dictionary files
>> > generated in case of global dictionary.
>> > 3.
>> >
>> > It will eliminate the problem for user to identify the dictionary
>> > columns when the number of columns are more in a table.
>> > 4.
>> >
>> > It helps in getting more compression on dimension columns with less
>> > cardinality.
>> > 5.
>> >
>> > Filter query on No-dictionary columns with local dictionary will be
>> > faster as filter will be done on encoded data.
>> > 6.
>> >
>> > It will help in reducing the store size and memory footprint as only
>> > unique values will be stored as part of local dictionary and
>> > corresponding data will be stored as encoded data.
>> >
>> > Please provide your comment. Any suggestion from community is most
>> > welcomed. Please let me know for any clarification.
>> >
>> > -Regards
>> > Kumar Vishal
>>
>>
>>
>>
>