Re: Greater than/less-than/Like filters optmization
Posted by
kumarvishal09 on
Dec 22, 2016; 1:54am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Greater-than-less-than-Like-filters-optmization-tp4134p4829.html
Hi Sujith,
+1 I think this will be a good optimization for dictionary column.
-Regards
Kumar Vishal
On Mon, Dec 12, 2016 at 3:26 AM, sujith chacko <
[hidden email]>
wrote:
> Hi All,
>
> I am having a suggestion for improving the filter queries which require
> expression evaluation for
> identifying its dictionary value.
>
> *Current design *
> In *greater than/less-than/Like* *filters*, system first iterates each row
> present in the dictionary cache for identifying valid filter actual members
> by applying the filter expression , once evaluation done system will hold
> the list of identified valid filter actual member values(String), now in
> next step again system will look up the dictionary cache in order to
> identify the dictionary surrogate values of the identified members. this
> look up is an additional cost to our system even though the look up
> methodology is an binary search in dictionary cache.
>
> *Proposed design/solution:*
> *Identify the dictionary surrogate values in filter expression evaluation
> step itself when actual dictionary values will be scanned for identifying
> valid filter members .*
>
> Keep a dictionary counter variable which will be increased when system
> iterates through the dictionary cache in order to retrieve each actual
> member stored in dictionary cache , after this system will evaluate each
> row against the filter expression to identify whether its a valid filter
> member or not, while doing this process itself counter value can be taken
> as valid selected dictionary value since the actual member values and
> its dictionary
> values will be kept in same order in dictionary cache as the iteration
> order.
>
> *thus it will eliminate the further dictionary look up* *step *which is
> required to retrieve the dictionary surrogate value against identified
> actual valid filter member. this can also increase significantly the filter
> query performance of such filter queries which require expression
> evaluation to identify it the filter members by looking up dictionary
> cache, like *greater than/less-than/Like* filters .
>
> *Note : this optimization is applicable for dictionary columns.*
>
> Please let me know for valid inputs/suggestions.
>
> Thanks,
> Sujith
>
kumar vishal