This post was updated on .
When user use SDK and want to use LOCAL DICTIONARY, they can't use
LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE because SDK only support local_dictionary_threshold and local_dictionary_enable. So we should support LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE in SDK, then use can include part of columns or exclude part of columns. JIRA is:https://issues.apache.org/jira/browse/CARBONDATA-3151 -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
-1
We are planing to remove this Include and exclude in local dictionary from carbon session as exposing too many property will confuse user it's better to keep it simple by handle internally. Without exposing new property to user, current code can still handle it by fallback mechanism, so I do not think it's required. -Regards Kumar Vishal On Thu, Dec 6, 2018 at 4:52 PM xubo245 <[hidden email]> wrote: > When user use SDK and want to use LOCAL DICTIONARY, they can't use > LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE because SDK only > support local_dictionary_threshold and local_dictionary_enable. > > So we should support LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE > in SDK, then use can include part of columns or exclude part of columns. > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ >
kumar vishal
|
I agree with @kumarvishal , better not add more options as it confuses the user. We better fallback automatically depends on the size of the dictionary. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
@kumar vishal what is the fallback performance if more number of columns
need to fallback. Would it not increase the overhead of generating temporary dictionary and discarding it? On Fri, 7 Dec 2018, 12:56 pm ravipesala, <[hidden email]> wrote: > > I agree with @kumarvishal , better not add more options as it confuses the > user. We better fallback automatically depends on the size of the > dictionary. > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
@Raghunandan subramanya <[hidden email]>
We have tested with *80 string columns with 10 high cardinality columns(fallback happened for these columns)*, please find the stats: *Test result is with 1 billion records 385 Gb size* *1. Load time without local dictionary:* 66 minutes *2. Load time without fallback local dictionary:* 72 minutes *3. Load time with fallback local dictionary:* 74 minutes *Without fallback local dictionary:* 9.09% degradation *With fallback local dictionary:* 13.63% -Regards Kumar Vishal On Fri, Dec 7, 2018 at 12:59 PM Raghunandan S < [hidden email]> wrote: > @kumar vishal what is the fallback performance if more number of columns > need to fallback. Would it not increase the overhead of generating > temporary dictionary and discarding it? > > On Fri, 7 Dec 2018, 12:56 pm ravipesala, <[hidden email]> wrote: > > > > > I agree with @kumarvishal , better not add more options as it confuses > the > > user. We better fallback automatically depends on the size of the > > dictionary. > > > > > > > > -- > > Sent from: > > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > > >
kumar vishal
|
Whether different data type affects performance? Have you test with long
string column? -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Free forum by Nabble | Edit this page |