http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Greater-than-less-than-Like-filters-optmization-tp4134p4906.html
> Hi Sujith,
> +1 I think this will be a good optimization for dictionary column.
>
> -Regards
> Kumar Vishal
>
> On Mon, Dec 12, 2016 at 3:26 AM, sujith chacko <
>
[hidden email]>
> wrote:
>
> > Hi All,
> >
> > I am having a suggestion for improving the filter queries which require
> > expression evaluation for
> > identifying its dictionary value.
> >
> > *Current design *
> > In *greater than/less-than/Like* *filters*, system first iterates each
> row
> > present in the dictionary cache for identifying valid filter actual
> members
> > by applying the filter expression , once evaluation done system will
> hold
> > the list of identified valid filter actual member values(String), now in
> > next step again system will look up the dictionary cache in order to
> > identify the dictionary surrogate values of the identified members. this
> > look up is an additional cost to our system even though the look up
> > methodology is an binary search in dictionary cache.
> >
> > *Proposed design/solution:*
> > *Identify the dictionary surrogate values in filter expression evaluation
> > step itself when actual dictionary values will be scanned for
> identifying
> > valid filter members .*
> >
> > Keep a dictionary counter variable which will be increased when system
> > iterates through the dictionary cache in order to retrieve each actual
> > member stored in dictionary cache , after this system will evaluate each
> > row against the filter expression to identify whether its a valid filter
> > member or not, while doing this process itself counter value can be taken
> > as valid selected dictionary value since the actual member values and
> > its dictionary
> > values will be kept in same order in dictionary cache as the iteration
> > order.
> >
> > *thus it will eliminate the further dictionary look up* *step *which is
> > required to retrieve the dictionary surrogate value against identified
> > actual valid filter member. this can also increase significantly the
> filter
> > query performance of such filter queries which require expression
> > evaluation to identify it the filter members by looking up dictionary
> > cache, like *greater than/less-than/Like* filters .
> >
> > *Note : this optimization is applicable for dictionary columns.*
> >
> > Please let me know for valid inputs/suggestions.
> >
> > Thanks,
> > Sujith
> >
>