Apache CarbonData Dev Mailing List archive

[DISCUSSION] Distributed Index Cache Server

Posted by kunalkapoor on Feb 05, 2019; 10:57am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSSION-Distributed-Index-Cache-Server-tp75008.html

Hi All,

Carbon currently caches all block/blocklet datamap index information into
the driver. And for bloom type of datamap, it can prune the splits in a
distributed way using distributed datamap pruning. In the first case, there
are limitations like driver memory scale up and reusing of one driver cache
by others is not possible. In the second case, there are limitations like
there is no guarantee that the next query goes to the same executor to
reuse the cache.

Based on the above problems there is a need to have a centralised index
cache server.

Please find below the link for the design document.

https://docs.google.com/document/d/161NXxrKLPucIExkWip5mX00x2iOPH6bvsuQnCzzp47E/edit?ts=5c542ab4#heading=h.x0qaehgkncz5

Thanks

Kunal Kapoor