Hi Dev
Currently we are supporting LOCAL DICTIONARY feature during data load operation. The feature is very helpful in terms that it reduces the store size which helps is reducing the IO thereby enhancing the query performance. *This proposal is to extend LOCAL DICTIONARY feature and provide a separate DDL and offline support for this feature. This is will make this feature usage more flexible. The reason for proposing this feature is*: 1. DDL support which can enable stores without local dictionary to add this feature for the already loaded data. This can be helpful for customers to leverage the functionality of LOCAL DICTIONARY feature for their data which is written in carbondata format without local dictionary. 2. We know that when Local dictionary is enabled, though small but there is degrade in data load performance. So there can be applications/customers who want to fine tune the loaded data in off-peak time. This feature can be helpful for those kind of scenarios. 3. Offline support is proposed for SDK like features where In we do not have spark driver executor model and there can be only a single thread used for loading data. So for this scenario we can provide an offline support thereby not impacting the existing data load performance. Please let me know your suggestions for this proposal. If most of the community members feel the idea is good and it will make the usage of this feature more flexible I can come up with a design and further discuss on this platform. Regards Manish Gupta |
+1
Yes, I think SDK should provide local dictionary support also. Regards, Jacky > 在 2018年11月5日,下午2:14,manish gupta <[hidden email]> 写道: > > Hi Dev > > Currently we are supporting LOCAL DICTIONARY feature during data load > operation. The feature is very helpful in terms that it reduces the store > size which helps is reducing the IO thereby enhancing the query performance. > *This proposal is to extend LOCAL DICTIONARY feature and provide a separate > DDL and offline support for this feature. This is will make this feature > usage more flexible. The reason for proposing this feature is*: > > 1. DDL support which can enable stores without local dictionary to add this > feature for the already loaded data. This can be helpful for customers to > leverage the functionality of LOCAL DICTIONARY feature for their data > which is written in carbondata format without local dictionary. > 2. We know that when Local dictionary is enabled, though small but there is > degrade in data load performance. So there can be applications/customers > who want to fine tune the loaded data in off-peak time. This feature can be > helpful for those kind of scenarios. > 3. Offline support is proposed for SDK like features where In we do not > have spark driver executor model and there can be only a single thread used > for loading data. So for this scenario we can provide an offline support > thereby not impacting the existing data load performance. > > Please let me know your suggestions for this proposal. If most of the > community members feel the idea is good and it will make the usage of this > feature more flexible I can come up with a design and further discuss on > this platform. > > Regards > Manish Gupta > |
SDK has supported local dictionary:
org.apache.carbondata.sdk.file.CarbonWriterBuilder#localDictionaryThreshold org.apache.carbondata.sdk.file.CarbonWriterBuilder#enableLocalDictionary But don't support LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE. I think we should support it. There are some users want to use LOCAL_DICTIONARY_EXCLUDE. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Does local dictionary harm the performance so that they want to disable it for some specific columns?
Sent from laptop From: xubo245 Sent: Monday, November 12, 2018 10:05 AM To: [hidden email] Subject: Re: [Feature Proposal] Proposal for offline and DDL localdictionary support SDK has supported local dictionary: org.apache.carbondata.sdk.file.CarbonWriterBuilder#localDictionaryThreshold org.apache.carbondata.sdk.file.CarbonWriterBuilder#enableLocalDictionary But don't support LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE. I think we should support it. There are some users want to use LOCAL_DICTIONARY_EXCLUDE. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Free forum by Nabble | Edit this page |