[DISCUSSION] Distributed Index Cache Server
Posted by
kunalkapoor on
Feb 05, 2019; 10:57am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSSION-Distributed-Index-Cache-Server-tp75008.html
Hi All,
Carbon currently caches all block/blocklet datamap index information into
the driver. And for bloom type of datamap, it can prune the splits in a
distributed way using distributed datamap pruning. In the first case, there
are limitations like driver memory scale up and reusing of one driver cache
by others is not possible. In the second case, there are limitations like
there is no guarantee that the next query goes to the same executor to
reuse the cache.
Based on the above problems there is a need to have a centralised index
cache server.
Please find below the link for the design document.
https://docs.google.com/document/d/161NXxrKLPucIExkWip5mX00x2iOPH6bvsuQnCzzp47E/edit?ts=5c542ab4#heading=h.x0qaehgkncz5Thanks
Kunal Kapoor