Posted by
Manhua-2 on
Aug 19, 2019; 2:38am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSSION-Cache-Pre-Priming-tp83559p83585.html
Hi, I come up with following ideas:
1. Although index server can provide more memory to hold the cache for index data, its space still has a limit.
So cache managment(especially cache invalid) should be paid attention if we Pre-Prime during data load or start of index server which easily fill up memory of index server as time goes by.
2. Pre-Prime is an extended optimization, and it should be focus more on what want to optimize.
So, about the cache way for pre-prime, I think the configuration can support a regex/wildcard match list:
- During start of index server, check and pre-prime matched EXISTED table;
- During data load, check and pre-prime matched NEW table or NEW segment;
This can lighten the workload, keeping targeted table cached in case of swap out when many index loaded to cache
3. Cache command can be another ways to Pre-Prime, manually. For test or embed in code.
On 2019/08/16 10:56:33, Akash Nilugal <
[hidden email]> wrote:
> Hi All,
>
> I have raised a jira and attached the design doc there .please refer
>
> CARBONDATA - 3492
>
> Regards,
> Akash
>
> On Thu, Aug 15, 2019, 5:33 PM Akash Nilugal <
[hidden email]> wrote:
>
> > Hi Community,
> >
> > Currently, we have an index server which basically helps in distributed
> > caching of the datamaps in a separate spark application.
> >
> > The caching of the datamaps in index server will start once the query is
> > fired on the table for the first time, all the datamaps will be loaded
> >
> > if the count(*) is fired and only required will be loaded for any filter
> > query.
> >
> >
> > Here the problem or the bottleneck is, until and unless the query is fired
> > on table, the caching won’t be done for the table datamaps.
> >
> > So consider a scenario where we are just loading the data to table for
> > whole day and then next day we query,
> >
> > so all the segments will start loading into cache. So first time the query
> > will be slow.
> >
> >
> > What if we load the datamaps into cache or preprime the cache without
> > waititng for any query on the table?
> >
> > Yes, what if we load the cache after every load is done, what if we load
> > the cache for all the segments at once,
> >
> > so that first time query need not do all this job, which makes it faster.
> >
> >
> > Here i have attached the design document for the pre-priming of cache into
> > index server. Please have a look at it
> >
> > and any suggestions or inputs on this are most welcomed.
> >
> >
> >
> >
https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing> >
> >
> >
> > Regards,
> >
> > Akash R Nilugal
> >
>