Login  Register

[Feature Proposal] Proposal for offline and DDL local dictionary support

Posted by manishgupta88 on Nov 05, 2018; 6:09am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Feature-Proposal-Proposal-for-offline-and-DDL-local-dictionary-support-tp67620.html

Hi Dev

Currently we are supporting LOCAL DICTIONARY feature during data load
operation. The feature is very helpful in terms that it reduces the store
size which helps is reducing the IO thereby enhancing the query performance.
*This proposal is to extend LOCAL DICTIONARY feature and provide a separate
DDL and offline support for this feature. This is will make this feature
usage more flexible. The reason for proposing this feature is*:

1. DDL support which can enable stores without local dictionary to add this
feature for the already loaded data. This can be helpful for customers to
leverage the functionality of LOCAL  DICTIONARY  feature for their data
which is written in carbondata format without local dictionary.
2. We know that when Local dictionary is enabled, though small but there is
degrade in data load performance. So there can be applications/customers
who want to fine tune the loaded data in off-peak time. This feature can be
helpful for those kind of scenarios.
3. Offline support is proposed for SDK like features where In we do not
have spark driver executor model and there can be only a single thread used
for loading data. So for this scenario we can provide an offline support
thereby not impacting the existing data load performance.

Please let me know your suggestions for this proposal. If most of the
community members feel the idea is good and it will make the usage of this
feature more flexible I can come up with a design and further discuss on
this platform.

Regards
Manish Gupta