Apache CarbonData Dev Mailing List archive

[New Feature] Range Filter Optimization

Classic

List

Threaded

3 messages Options

sounak

[New Feature] Range Filter Optimization

Hi,

Currently I am working in a RANGE Filter Optimization. Previously if in
case there are both Greater Than and Less than predicates on a same column,
then they are evaluated spearately where multiple times scanning of the
blocks are involved which is undoubtedly costly.

To optimize this, if "Greater and Less than" predicates are there the a
same column and joined by AND expression, then these two expression can be
merged into a single expression and can form Range Expression. Rather than
evaluating the expressions separately only one time the block will be
scanned and if the values lies within Range Min and Max then will be
selected.

Current the filter expression tree is transformed as shown below. The below
tranformation helps us to carry the Range MIN MAX value so that prunning
optimization is applied.

// AND AND
// | |
// / \ / \
// / \ / \
// Less Greater => TRUE Range
// / \ / \ / \
// / \ / \ / \
// a 10 a 5 Less greater
// /\ /\
// / \ / \
// a 10 a 5

Presence of a Range in a single expression gives few more optimization
during block prunning. Comparing Filter Min and Max and Block Min and Max
values few decisions can be taken as shown below.

Case A) In case the Filter Range lies completely outside of Block range =>
Then no scanning is required and the block can be completely skipped.

-----------------------------------

| BLOCK |

-----------------------------------
-----------------------------
Min MAX
------------------------------------
| FILTER |
or |
FILTER |
------------------------------

------------------------------------
Min Max
Min
Max

Case B ) The Filter Completely Covers the Block => No scanning is required.
Select all Rows of the Block. i.e. turn all bits of block to true.

--------------------------------------

| BLOCK |

--------------------------------------

------------------------------------------------------------------
|
FILTER |

------------------------------------------------------------------

Case C) The Filter Overlaps the Block, But completely donot cover it. =>
Choose the Block Scan Start or End based on filter overlaping, no need to
do a binary search to find the Start and End Index of Block.

-------------------------------------
------------------------------------------
| BLOCK
| |
BLOCK |

-------------------------------------
-----------------------------------------
--------------------------------------------
OR
-------------------------------------------------
| FILTER |

| FILTER |
-------------------------------------------

-------------------------------------------------
Start of Block will be default
StartIndex Value End of
Blcok will be default EndIndex Value.

Along with these optimization in scanning few optimization are done during
filter expression tree formation.
Case A) Merging of Redundant Greater than Less Than filter predicates to a
Range.
Case B) Multiple Range Filters can be merged into one.

With this optimization the most benefitted candidates are Direct Dictionary
and No Dictionary Columns. For Dictionary we only get the surrogate Keys
withing the Range.

Please let me know your thoughts in this project.
--
Thanks
Sounak

kumarvishal09

Re: [New Feature] Range Filter Optimization

+1.

This will help in range query as only one time filter will be applied and
range of binary search will decrease as we can do incremental binary
search. Currently in carbon range filter for dictionary column is slow as
number of filter applied can be more and it degrade the filter query
performance, above optimisation will improve the dictionary column also.

-Regards
Kumar Vishal

On Tue, Mar 21, 2017 at 9:14 AM, sounak <[hidden email]> wrote:

> Hi,
>
> Currently I am working in a RANGE Filter Optimization. Previously if in
> case there are both Greater Than and Less than predicates on a same column,
> then they are evaluated spearately where multiple times scanning of the
> blocks are involved which is undoubtedly costly.
>
> To optimize this, if "Greater and Less than" predicates are there the a
> same column and joined by AND expression, then these two expression can be
> merged into a single expression and can form Range Expression. Rather than
> evaluating the expressions separately only one time the block will be
> scanned and if the values lies within Range Min and Max then will be
> selected.
>
> Current the filter expression tree is transformed as shown below. The below
> tranformation helps us to carry the Range MIN MAX value so that prunning
> optimization is applied.
>
> // AND AND
> // | |
> // / \ / \
> // / \ / \
> // Less Greater => TRUE Range
> // / \ / \ / \
> // / \ / \ / \
> // a 10 a 5 Less greater
> // /\
> /\
> // / \
> / \
> // a 10 a
> 5
>
>
> Presence of a Range in a single expression gives few more optimization
> during block prunning. Comparing Filter Min and Max and Block Min and Max
> values few decisions can be taken as shown below.
>
> Case A) In case the Filter Range lies completely outside of Block range =>
> Then no scanning is required and the block can be completely skipped.
>
> -----------------------------------
>
> | BLOCK |
>
> -----------------------------------
> -----------------------------
> Min MAX
> ------------------------------------
> | FILTER |
> or |
> FILTER |
> ------------------------------
>
> ------------------------------------
> Min Max
> Min
> Max
>
> Case B ) The Filter Completely Covers the Block => No scanning is required.
> Select all Rows of the Block. i.e. turn all bits of block to true.
>
> --------------------------------------
>
> | BLOCK |
>
> --------------------------------------
>
> ------------------------------------------------------------------
> |
> FILTER |
>
> ------------------------------------------------------------------
>
> Case C) The Filter Overlaps the Block, But completely donot cover it. =>
> Choose the Block Scan Start or End based on filter overlaping, no need to
> do a binary search to find the Start and End Index of Block.
>
>
> -------------------------------------
> ------------------------------------------
> | BLOCK
> | |
> BLOCK |
>
> -------------------------------------
> -----------------------------------------
> --------------------------------------------
> OR
> -------------------------------------------------
> | FILTER |
>
> | FILTER |
> -------------------------------------------
>
> -------------------------------------------------
> Start of Block will be default
> StartIndex Value End of
> Blcok will be default EndIndex Value.
>
> Along with these optimization in scanning few optimization are done during
> filter expression tree formation.
> Case A) Merging of Redundant Greater than Less Than filter predicates to a
> Range.
> Case B) Multiple Range Filters can be merged into one.
>
> With this optimization the most benefitted candidates are Direct Dictionary
> and No Dictionary Columns. For Dictionary we only get the surrogate Keys
> withing the Range.
>
> Please let me know your thoughts in this project.
> --
> Thanks
> Sounak
>

kumar vishal

David CaiQiang

Re: [New Feature] Range Filter Optimization

In reply to this post by sounak

+1

Best Regards
David QiangCai

Best Regards
David Cai