Apache CarbonData Dev Mailing List archive

Re: [DISCUSSION] Cache Pre Priming

Posted by akashnilugal@gmail.com on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSSION-Cache-Pre-Priming-tp83559p83601.html

Hi xuchianyin,

Thanks for the question

1. Currently implementation is no need to load all the segments, only
required will be loaded during filter and all segments will be loaded
during query like count *.

2. Cache loading is fired during pruning phase in query, it will go to
index server prune and load to cache , if index server is disabled and if
distributed pruning is enabled then distributed pruning happens else driver
side pruning, please check the index server Design doc for more info on
this.

For auto compaction, no need to load to index server, because internally
one more level of compaction can happen and old loaded segments can become
invalid, I will handle this is Design document.

3. Index server is a separate spark application meant for caching , so for
SDK , spark session doesn't come into picture, so SDK not applicable, for
file format case we will handle.

Please get back for any clarifications or inputs.

Thanks and Regards

Akash R Nilugal

On Thu, Aug 15, 2019, 5:33 PM Akash Nilugal <[hidden email]> wrote:

> Hi Community,
>
> Currently, we have an index server which basically helps in distributed
> caching of the datamaps in a separate spark application.
>
> The caching of the datamaps in index server will start once the query is
> fired on the table for the first time, all the datamaps will be loaded
>
> if the count(*) is fired and only required will be loaded for any filter
> query.
>
>
> Here the problem or the bottleneck is, until and unless the query is fired
> on table, the caching won’t be done for the table datamaps.
>
> So consider a scenario where we are just loading the data to table for
> whole day and then next day we query,
>
> so all the segments will start loading into cache. So first time the query
> will be slow.
>
>
> What if we load the datamaps into cache or preprime the cache without
> waititng for any query on the table?
>
> Yes, what if we load the cache after every load is done, what if we load
> the cache for all the segments at once,
>
> so that first time query need not do all this job, which makes it faster.
>
>
> Here i have attached the design document for the pre-priming of cache into
> index server. Please have a look at it
>
> and any suggestions or inputs on this are most welcomed.
>
>
>
> https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing
>
>
>
> Regards,
>
> Akash R Nilugal
>