Hi,
Currently I am working in a RANGE Filter Optimization. Previously if in case there are both Greater Than and Less than predicates on a same column, then they are evaluated spearately where multiple times scanning of the blocks are involved which is undoubtedly costly. To optimize this, if "Greater and Less than" predicates are there the a same column and joined by AND expression, then these two expression can be merged into a single expression and can form Range Expression. Rather than evaluating the expressions separately only one time the block will be scanned and if the values lies within Range Min and Max then will be selected. Current the filter expression tree is transformed as shown below. The below tranformation helps us to carry the Range MIN MAX value so that prunning optimization is applied. // AND AND // | | // / \ / \ // / \ / \ // Less Greater => TRUE Range // / \ / \ / \ // / \ / \ / \ // a 10 a 5 Less greater // /\ /\ // / \ / \ // a 10 a 5 Presence of a Range in a single expression gives few more optimization during block prunning. Comparing Filter Min and Max and Block Min and Max values few decisions can be taken as shown below. Case A) In case the Filter Range lies completely outside of Block range => Then no scanning is required and the block can be completely skipped. ----------------------------------- | BLOCK | ----------------------------------- ----------------------------- Min MAX ------------------------------------ | FILTER | or | FILTER | ------------------------------ ------------------------------------ Min Max Min Max Case B ) The Filter Completely Covers the Block => No scanning is required. Select all Rows of the Block. i.e. turn all bits of block to true. -------------------------------------- | BLOCK | -------------------------------------- ------------------------------------------------------------------ | FILTER | ------------------------------------------------------------------ Case C) The Filter Overlaps the Block, But completely donot cover it. => Choose the Block Scan Start or End based on filter overlaping, no need to do a binary search to find the Start and End Index of Block. ------------------------------------- ------------------------------------------ | BLOCK | | BLOCK | ------------------------------------- ----------------------------------------- -------------------------------------------- OR ------------------------------------------------- | FILTER | | FILTER | ------------------------------------------- ------------------------------------------------- Start of Block will be default StartIndex Value End of Blcok will be default EndIndex Value. Along with these optimization in scanning few optimization are done during filter expression tree formation. Case A) Merging of Redundant Greater than Less Than filter predicates to a Range. Case B) Multiple Range Filters can be merged into one. With this optimization the most benefitted candidates are Direct Dictionary and No Dictionary Columns. For Dictionary we only get the surrogate Keys withing the Range. Please let me know your thoughts in this project. -- Thanks Sounak |
+1.
This will help in range query as only one time filter will be applied and range of binary search will decrease as we can do incremental binary search. Currently in carbon range filter for dictionary column is slow as number of filter applied can be more and it degrade the filter query performance, above optimisation will improve the dictionary column also. -Regards Kumar Vishal On Tue, Mar 21, 2017 at 9:14 AM, sounak <[hidden email]> wrote: > Hi, > > Currently I am working in a RANGE Filter Optimization. Previously if in > case there are both Greater Than and Less than predicates on a same column, > then they are evaluated spearately where multiple times scanning of the > blocks are involved which is undoubtedly costly. > > To optimize this, if "Greater and Less than" predicates are there the a > same column and joined by AND expression, then these two expression can be > merged into a single expression and can form Range Expression. Rather than > evaluating the expressions separately only one time the block will be > scanned and if the values lies within Range Min and Max then will be > selected. > > Current the filter expression tree is transformed as shown below. The below > tranformation helps us to carry the Range MIN MAX value so that prunning > optimization is applied. > > // AND AND > // | | > // / \ / \ > // / \ / \ > // Less Greater => TRUE Range > // / \ / \ / \ > // / \ / \ / \ > // a 10 a 5 Less greater > // /\ > /\ > // / \ > / \ > // a 10 a > 5 > > > Presence of a Range in a single expression gives few more optimization > during block prunning. Comparing Filter Min and Max and Block Min and Max > values few decisions can be taken as shown below. > > Case A) In case the Filter Range lies completely outside of Block range => > Then no scanning is required and the block can be completely skipped. > > ----------------------------------- > > | BLOCK | > > ----------------------------------- > ----------------------------- > Min MAX > ------------------------------------ > | FILTER | > or | > FILTER | > ------------------------------ > > ------------------------------------ > Min Max > Min > Max > > Case B ) The Filter Completely Covers the Block => No scanning is required. > Select all Rows of the Block. i.e. turn all bits of block to true. > > -------------------------------------- > > | BLOCK | > > -------------------------------------- > > ------------------------------------------------------------------ > | > FILTER | > > ------------------------------------------------------------------ > > Case C) The Filter Overlaps the Block, But completely donot cover it. => > Choose the Block Scan Start or End based on filter overlaping, no need to > do a binary search to find the Start and End Index of Block. > > > ------------------------------------- > ------------------------------------------ > | BLOCK > | | > BLOCK | > > ------------------------------------- > ----------------------------------------- > -------------------------------------------- > OR > ------------------------------------------------- > | FILTER | > > | FILTER | > ------------------------------------------- > > ------------------------------------------------- > Start of Block will be default > StartIndex Value End of > Blcok will be default EndIndex Value. > > Along with these optimization in scanning few optimization are done during > filter expression tree formation. > Case A) Merging of Redundant Greater than Less Than filter predicates to a > Range. > Case B) Multiple Range Filters can be merged into one. > > With this optimization the most benefitted candidates are Direct Dictionary > and No Dictionary Columns. For Dictionary we only get the surrogate Keys > withing the Range. > > Please let me know your thoughts in this project. > -- > Thanks > Sounak >
kumar vishal
|
In reply to this post by sounak
+1
Best Regards David QiangCai
Best Regards
David Cai |
Free forum by Nabble | Edit this page |