SDK supports LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

SDK supports LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE

xubo245
This post was updated on .
When user use SDK and want to use LOCAL DICTIONARY, they can't use
LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE because SDK only
support local_dictionary_threshold and local_dictionary_enable.

So we should support  LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE
in SDK, then use can include part of columns or exclude part of columns.

JIRA is:https://issues.apache.org/jira/browse/CARBONDATA-3151



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: SDK support LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE

kumarvishal09
-1

We are planing to remove this Include and exclude in local dictionary from
carbon session as exposing too many property will confuse user
 it's better to keep it simple by handle internally.
Without exposing new property to user, current code can still handle it by
fallback mechanism, so I do not think it's required.

-Regards
Kumar Vishal

On Thu, Dec 6, 2018 at 4:52 PM xubo245 <[hidden email]> wrote:

> When user use SDK and want to use LOCAL DICTIONARY, they can't use
> LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE because SDK only
> support local_dictionary_threshold and local_dictionary_enable.
>
> So we should support  LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE
> in SDK, then use can include part of columns or exclude part of columns.
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
kumar vishal
Reply | Threaded
Open this post in threaded view
|

Re: SDK support LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE

ravipesala

I agree with @kumarvishal , better not add more options as it confuses the
user. We better fallback automatically depends on the size of the
dictionary.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: SDK support LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE

sraghunandan
@kumar vishal what is the fallback performance if more number of columns
need to fallback. Would it not increase the overhead of generating
temporary dictionary and discarding it?

On Fri, 7 Dec 2018, 12:56 pm ravipesala, <[hidden email]> wrote:

>
> I agree with @kumarvishal , better not add more options as it confuses the
> user. We better fallback automatically depends on the size of the
> dictionary.
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: SDK support LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE

kumarvishal09
@Raghunandan subramanya <[hidden email]>
We have tested with *80 string columns with 10 high cardinality
columns(fallback happened for these columns)*, please find the stats:

*Test result is with 1 billion records 385 Gb size*

*1. Load time without local dictionary:* 66 minutes
*2. Load time without fallback local dictionary:* 72 minutes
*3. Load time with fallback local dictionary:* 74 minutes

*Without fallback local dictionary:* 9.09% degradation
*With fallback local dictionary:* 13.63%

-Regards
Kumar Vishal

On Fri, Dec 7, 2018 at 12:59 PM Raghunandan S <
[hidden email]> wrote:

> @kumar vishal what is the fallback performance if more number of columns
> need to fallback. Would it not increase the overhead of generating
> temporary dictionary and discarding it?
>
> On Fri, 7 Dec 2018, 12:56 pm ravipesala, <[hidden email]> wrote:
>
> >
> > I agree with @kumarvishal , better not add more options as it confuses
> the
> > user. We better fallback automatically depends on the size of the
> > dictionary.
> >
> >
> >
> > --
> > Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
>
kumar vishal
Reply | Threaded
Open this post in threaded view
|

Re: SDK support LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE

xubo245
Whether different data type affects performance? Have you test with long
string column?



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/