Hi All,
Currently in carbondata we have LRU caching feature for maintaing BTree and Dictionary cache. This feature is helpful in low end systems where memory is less or where the user want to have a control on the memory to be used by carbondata system for caching. In LRU cache for every key in cache map an atomic access count variable is maintained which is incremented when a query accesses that key and decremented after its usage is complete by that query. There are many places where we access dictionary columns like decoding values for result preparation, filter operation, data loading, etc and it becomes a cumbersome process to maintain the access count during each access at entry and exit level for the operation. If there is any inconsistency in incrementing and decrementing access count, the corresponding key from the caching map will never be cleared from the memory and if space is not freed queries will start failing with unavailable memory exception. Therefore I suggest the following behavior. 1. Remove access count based removal from the caching framework and make the framework properly LRU based removal. 2. Ensure that for one query BTree and dictionary cache is accessed atmost once by driver and executor. 3. Fail the query if the size required by the dictionary column or BTree is more than the size configured by the user for LRU cache. This is the because the user should be clear that the size for caching need to be increased and carbondata system is not taking any run time decision. Please share your inputs on this. Regards Manish Gupta |
Looks fine, but failing the query when memory is not enough to fit to the
LRU cache is not a good idea. In LRU caching terminology there is no question of failing the queries. we just evict the old data from cache if memory is not sufficient to add latest one. we should log this eviction and adding so that user can analyse whether he should increase the cache size or not. Regards, Ravindra. On 15 May 2017 at 11:54, manish gupta <[hidden email]> wrote: > Hi All, > > Currently in carbondata we have LRU caching feature for maintaing BTree and > Dictionary cache. This feature is helpful in low end systems where memory > is less or where the user want to have a control on the memory to be used > by carbondata system for caching. > > In LRU cache for every key in cache map an atomic access count variable is > maintained which is incremented when a query accesses that key and > decremented after its usage is complete by that query. > > There are many places where we access dictionary columns like decoding > values for result preparation, filter operation, data loading, etc and it > becomes a cumbersome process to maintain the access count during each > access at entry and exit level for the operation. If there is any > inconsistency in incrementing and decrementing access count, the > corresponding key from the caching map will never be cleared from the > memory and if space is not freed queries will start failing with > unavailable memory exception. > > Therefore I suggest the following behavior. > 1. Remove access count based removal from the caching framework and make > the framework properly LRU based removal. > 2. Ensure that for one query BTree and dictionary cache is accessed atmost > once by driver and executor. > 3. Fail the query if the size required by the dictionary column or BTree is > more than the size configured by the user for LRU cache. This is the > because the user should be clear that the size for caching need to be > increased and carbondata system is not taking any run time decision. > > Please share your inputs on this. > > Regards > Manish Gupta > -- Thanks & Regards, Ravi |
Free forum by Nabble | Edit this page |