Apache CarbonData Dev Mailing List archive - RE: [Discussion] Bloom memory and pruning optimisation using hierarchical pruning.

Apache CarbonData Dev Mailing List archive

RE: [Discussion] Bloom memory and pruning optimisation using hierarchical pruning.

Posted by ravipesala on Dec 03, 2018; 7:05am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Bloom-memory-and-pruning-optimisation-using-hierarchical-pruning-tp69382p69651.html

Hi xuchuanyin,

1. There is no need to maintain separate bloom configurations for task level
bloom as we use same configuration (size and fpp) provided by user. We just
create task level bloom with the same configuration along with blocklet
bloom.

2. Size of bloom is much smaller compared to blocklet level bloom, but yes
if data or tasks increases it will also increase over the time. But still,
we can use it in driver lru cache as we may not query all the data all time
so it keeps only most recently used data only. And also we can skip driver
side bloom pruning and do only at executor side if the bloom is very large.

Yes, we can maintain bloom at carbondata footer level like parquet/orc but
we will lose the datamap framework features like lazy datamap loading or
creating. Instead, we can maintain bloom in separate files but maintain the
footer to the file as mentioned in my earlier mail.

Regards,
Ravindra.

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/