Apache CarbonData Dev Mailing List archive

Re: [DISCUSSION] Page Level Bloom Filter

Posted by ravipesala on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSSION-Page-Level-Bloom-Filter-tp85720p87415.html

Hi Manhua,

Main problem with this approach is we cannot save any IO as our IO unit is
blocklet not page. Once it is already to memory I really don’t think we can
get performance with bloom at page level. I feel the solution would be
efficient only the IO is saved somewhere.

Our min/max index is efficient because it can prune the files at driver side
and prune the blocklets and pages at the executor side. It is actually
saving lots of IO.

Supporting bloom at carbondata file and index level is a good approach
rather than just supporting at page level. My intention is that it should
behave just the same as the min/max index. So that we can prune the data at
multiple levels.

The driver side at the block level we can have a bloom with less probability
percentage and fewer hash functions to control the size as we load it to the
memory. And in the blocklet level we can increase the probability and hashes
little more for better pruning, gradually at page level we can increase the
probability further to have a much better pruning ability.

Regards,
Ravindra.

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/