Apache CarbonData Dev Mailing List archive

Re: [DISCUSSION] Cache Pre Priming

Posted by Manhua-2 on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSSION-Cache-Pre-Priming-tp83559p83602.html

Hi Akash,

1. cache will be full when loading is still running all the time. the reason I mention the invalidation is to avoid case, specially, when cache is full before all targeted index is loaded.

When server just starting, keeping pre-prime and swap out the earliest loaded index is not good.
Maybe pre-prime need to check the capacity of available cache before load index, else stop pre-prime any more?

2. I think regex/wildcard is more flexible to use,
such as :
*.* for all dbs and tables
test.* for all tables in test db
test.day_table_201908* for table has targeted prefix

3. yes, you are right, fire a count(*) can do that.

On 2019/08/19 09:23:06, Akash Nilugal <[hidden email]> wrote:

> Hi manhua,
>
> Thanks for the inputs.
>
> 1. No need to take care separately to invalidate the cache, i agree that it
> will have limit. Since we already have eviction policy, when next query
> comes, whenever required, it will evict and load the segments required, so
> better not to have a separate mechanism to invalidate cache during
> pre-prime.
>
> 2.
> i. For configuration support of pre-prime, already we can have the database
> name or table name, about the regex support, we will note it, and based on
> other use case and impacts, i will update the design document.
> ii. During load no need to load the table or read any configuration for
> pre-prime. During load pre-prime, just take the current new segment and
> load into cache.
>
> 3. For command support, can you please explain with more use cases. Because
> current index server startup will load, and when you say command, even if i
> do count(*) also, that will load all the segments. So i think new command
> won't be necessary.
>
> Please get back for any clarifications or doubts.
>
> Thanks
>
> Regards,
> Akash R Nilugal
>
> On Fri, Aug 16, 2019, 4:26 PM Akash Nilugal <[hidden email]> wrote:
>
> > Hi All,
> >
> > I have raised a jira and attached the design doc there .please refer
> >
> > CARBONDATA - 3492
> >
> > Regards,
> > Akash
> >
> > On Thu, Aug 15, 2019, 5:33 PM Akash Nilugal <[hidden email]>
> > wrote:
> >
> >> Hi Community,
> >>
> >> Currently, we have an index server which basically helps in distributed
> >> caching of the datamaps in a separate spark application.
> >>
> >> The caching of the datamaps in index server will start once the query is
> >> fired on the table for the first time, all the datamaps will be loaded
> >>
> >> if the count(*) is fired and only required will be loaded for any filter
> >> query.
> >>
> >>
> >> Here the problem or the bottleneck is, until and unless the query is
> >> fired on table, the caching won’t be done for the table datamaps.
> >>
> >> So consider a scenario where we are just loading the data to table for
> >> whole day and then next day we query,
> >>
> >> so all the segments will start loading into cache. So first time the
> >> query will be slow.
> >>
> >>
> >> What if we load the datamaps into cache or preprime the cache without
> >> waititng for any query on the table?
> >>
> >> Yes, what if we load the cache after every load is done, what if we load
> >> the cache for all the segments at once,
> >>
> >> so that first time query need not do all this job, which makes it faster.
> >>
> >>
> >> Here i have attached the design document for the pre-priming of cache
> >> into index server. Please have a look at it
> >>
> >> and any suggestions or inputs on this are most welcomed.
> >>
> >>
> >>
> >> https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing
> >>
> >>
> >>
> >> Regards,
> >>
> >> Akash R Nilugal
> >>
> >
>