Apache CarbonData Dev Mailing List archive

[DISCUSSION] Cache Pre Priming

Classic

List

Threaded

30 messages Options

akashnilugal@gmail.com

Aug 15, 2019; 12:03pm

[DISCUSSION] Cache Pre Priming

Hi Community,

Currently, we have an index server which basically helps in distributed caching of the datamaps in a separate spark application.

The caching of the datamaps in index server will start once the query is fired on the table for the first time, all the datamaps will be loaded

if the count(*) is fired and only required will be loaded for any filter query.

Here the problem or the bottleneck is, until and unless the query is fired on table, the caching won’t be done for the table datamaps.

So consider a scenario where we are just loading the data to table for whole day and then next day we query,

so all the segments will start loading into cache. So first time the query will be slow.

What if we load the datamaps into cache or preprime the cache without waititng for any query on the table?

Yes, what if we load the cache after every load is done, what if we load the cache for all the segments at once,

so that first time query need not do all this job, which makes it faster.

Here i have attached the design document for the pre-priming of cache into index server. Please have a look at it

and any suggestions or inputs on this are most welcomed.

https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing

Regards,

Akash R Nilugal

manhua

Aug 16, 2019; 3:39am

Re: [DISCUSSION] Cache Pre Priming

Hi Akash,
Could you please raise a JIRA and attach the design doc? I cannot access

Thanks

---Original---
From: "Akash Nilugal"<[hidden email]>
Date: Thu, Aug 15, 2019 20:03 PM
To: "dev"<[hidden email]>;
Subject: [DISCUSSION] Cache Pre Priming

Hi Community,

Currently, we have an index server which basically helps in distributed caching of the datamaps in a separate spark application.

The caching of the datamaps in index server will start once the query is fired on the table for the first time, all the datamaps will be loaded

if the count(*) is fired and only required will be loaded for any filter query.

Here the problem or the bottleneck is, until and unless the query is fired on table, the caching won’t be done for the table datamaps.

So consider a scenario where we are just loading the data to table for whole day and then next day we query,

so all the segments will start loading into cache. So first time the query will be slow.

What if we load the datamaps into cache or preprime the cache without waititng for any query on the table?

Yes, what if we load the cache after every load is done, what if we load the cache for all the segments at once,

so that first time query need not do all this job, which makes it faster.

Here i have attached the design document for the pre-priming of cache into index server. Please have a look at it

and any suggestions or inputs on this are most welcomed.

https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing

Regards,

Akash R Nilugal

Regards
Manhua

akashnilugal@gmail.com

Aug 16, 2019; 10:56am

Re: [DISCUSSION] Cache Pre Priming

In reply to this post by akashnilugal@gmail.com

Hi All,

I have raised a jira and attached the design doc there .please refer

CARBONDATA - 3492

Regards,
Akash

On Thu, Aug 15, 2019, 5:33 PM Akash Nilugal <[hidden email]> wrote:

> Hi Community,
>
> Currently, we have an index server which basically helps in distributed
> caching of the datamaps in a separate spark application.
>
> The caching of the datamaps in index server will start once the query is
> fired on the table for the first time, all the datamaps will be loaded
>
> if the count(*) is fired and only required will be loaded for any filter
> query.
>
>
> Here the problem or the bottleneck is, until and unless the query is fired
> on table, the caching won’t be done for the table datamaps.
>
> So consider a scenario where we are just loading the data to table for
> whole day and then next day we query,
>
> so all the segments will start loading into cache. So first time the query
> will be slow.
>
>
> What if we load the datamaps into cache or preprime the cache without
> waititng for any query on the table?
>
> Yes, what if we load the cache after every load is done, what if we load
> the cache for all the segments at once,
>
> so that first time query need not do all this job, which makes it faster.
>
>
> Here i have attached the design document for the pre-priming of cache into
> index server. Please have a look at it
>
> and any suggestions or inputs on this are most welcomed.
>
>
>
> https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing
>
>
>
> Regards,
>
> Akash R Nilugal
>