Apache CarbonData Dev Mailing List archive

[DISCUSSION] Cache Pre Priming

Classic

List

Threaded

30 messages Options

kumarvishal09

Re: [DISCUSSION] Cache Pre Priming

Hi Akash,

I think better to PrePrime only after each load(Async).

As mentioned in design doc, when index servers is started, if the table or
db is configured, until and unless all the configured things are loaded
into cache, Index server won't be available for query. So query cannot get
benefit of pre-prime untill all the metadata is loaded to cache.
In order to avoid this user can run count(*) after startup to pre-prime
only required tables. Any extra ddl is not required as count(*) can be used
as a DDL to load the cache.

-Regards
Kumar Vishal

On Wed, Aug 21, 2019 at 6:57 PM Akash Nilugal <[hidden email]>
wrote:

> Hi chetan,
>
> As mentioned in design , loading to cache will be an asyc operation, and
> we will load only the corresponding segment to cache, so there wont be any
> hit.
> Logs will be added
>
> On 2019/08/21 13:18:05, chetan bhat <[hidden email]> wrote:
> > Hi Akash,
> >
> > 1. Will the performance of end to end dataload operation be impacted if
> the segment datamap is loaded to cache once the load is finished.
> > 2. Will there be a notification in logs stating that the loading of
> datamap cache is completed.
> >
> > Regards
> >
> > On 2019/08/15 12:03:09, Akash Nilugal <[hidden email]> wrote:
> > > Hi Community,
> > >
> > > Currently, we have an index server which basically helps in distributed
> > > caching of the datamaps in a separate spark application.
> > >
> > > The caching of the datamaps in index server will start once the query
> is
> > > fired on the table for the first time, all the datamaps will be loaded
> > >
> > > if the count(*) is fired and only required will be loaded for any
> filter
> > > query.
> > >
> > >
> > > Here the problem or the bottleneck is, until and unless the query is
> fired
> > > on table, the caching won’t be done for the table datamaps.
> > >
> > > So consider a scenario where we are just loading the data to table for
> > > whole day and then next day we query,
> > >
> > > so all the segments will start loading into cache. So first time the
> query
> > > will be slow.
> > >
> > >
> > > What if we load the datamaps into cache or preprime the cache without
> > > waititng for any query on the table?
> > >
> > > Yes, what if we load the cache after every load is done, what if we
> load
> > > the cache for all the segments at once,
> > >
> > > so that first time query need not do all this job, which makes it
> faster.
> > >
> > >
> > > Here i have attached the design document for the pre-priming of cache
> into
> > > index server. Please have a look at it
> > >
> > > and any suggestions or inputs on this are most welcomed.
> > >
> > >
> > >
> https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing
> > >
> > >
> > >
> > > Regards,
> > >
> > > Akash R Nilugal
> > >
> >
>

kumar vishal

akashnilugal@gmail.com

Re: [DISCUSSION] Cache Pre Priming

Hi vishal,

Your point is correct, we can focus on just loading to cache after data load is finished (Async Operation).
for DDL support, count(*) can be used for all the required tables to load into cache.

Regards,
Akash

On 2019/08/22 14:44:34, Kumar Vishal <[hidden email]> wrote:

> Hi Akash,
>
> I think better to PrePrime only after each load(Async).
>
> As mentioned in design doc, when index servers is started, if the table or
> db is configured, until and unless all the configured things are loaded
> into cache, Index server won't be available for query. So query cannot get
> benefit of pre-prime untill all the metadata is loaded to cache.
> In order to avoid this user can run count(*) after startup to pre-prime
> only required tables. Any extra ddl is not required as count(*) can be used
> as a DDL to load the cache.
>
> -Regards
> Kumar Vishal
>
> On Wed, Aug 21, 2019 at 6:57 PM Akash Nilugal <[hidden email]>
> wrote:
>
> > Hi chetan,
> >
> > As mentioned in design , loading to cache will be an asyc operation, and
> > we will load only the corresponding segment to cache, so there wont be any
> > hit.
> > Logs will be added
> >
> > On 2019/08/21 13:18:05, chetan bhat <[hidden email]> wrote:
> > > Hi Akash,
> > >
> > > 1. Will the performance of end to end dataload operation be impacted if
> > the segment datamap is loaded to cache once the load is finished.
> > > 2. Will there be a notification in logs stating that the loading of
> > datamap cache is completed.
> > >
> > > Regards
> > >
> > > On 2019/08/15 12:03:09, Akash Nilugal <[hidden email]> wrote:
> > > > Hi Community,
> > > >
> > > > Currently, we have an index server which basically helps in distributed
> > > > caching of the datamaps in a separate spark application.
> > > >
> > > > The caching of the datamaps in index server will start once the query
> > is
> > > > fired on the table for the first time, all the datamaps will be loaded
> > > >
> > > > if the count(*) is fired and only required will be loaded for any
> > filter
> > > > query.
> > > >
> > > >
> > > > Here the problem or the bottleneck is, until and unless the query is
> > fired
> > > > on table, the caching won’t be done for the table datamaps.
> > > >
> > > > So consider a scenario where we are just loading the data to table for
> > > > whole day and then next day we query,
> > > >
> > > > so all the segments will start loading into cache. So first time the
> > query
> > > > will be slow.
> > > >
> > > >
> > > > What if we load the datamaps into cache or preprime the cache without
> > > > waititng for any query on the table?
> > > >
> > > > Yes, what if we load the cache after every load is done, what if we
> > load
> > > > the cache for all the segments at once,
> > > >
> > > > so that first time query need not do all this job, which makes it
> > faster.
> > > >
> > > >
> > > > Here i have attached the design document for the pre-priming of cache
> > into
> > > > index server. Please have a look at it
> > > >
> > > > and any suggestions or inputs on this are most welcomed.
> > > >
> > > >
> > > >
> > https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing
> > > >
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Akash R Nilugal
> > > >
> > >
> >
>

ravipesala

Re: [DISCUSSION] Cache Pre Priming

Hi Akash,

+1 for Vishal suggestion.Better focus on load data cache sync.

Regards,
Ravindra.

On Fri, 23 Aug 2019 at 16:35, Akash Nilugal <[hidden email]> wrote:

> Hi vishal,
>
> Your point is correct, we can focus on just loading to cache after data
> load is finished (Async Operation).
> for DDL support, count(*) can be used for all the required tables to load
> into cache.
>
> Regards,
> Akash
>
> On 2019/08/22 14:44:34, Kumar Vishal <[hidden email]> wrote:
> > Hi Akash,
> >
> > I think better to PrePrime only after each load(Async).
> >
> > As mentioned in design doc, when index servers is started, if the table
> or
> > db is configured, until and unless all the configured things are loaded
> > into cache, Index server won't be available for query. So query cannot
> get
> > benefit of pre-prime untill all the metadata is loaded to cache.
> > In order to avoid this user can run count(*) after startup to pre-prime
> > only required tables. Any extra ddl is not required as count(*) can be
> used
> > as a DDL to load the cache.
> >
> > -Regards
> > Kumar Vishal
> >
> > On Wed, Aug 21, 2019 at 6:57 PM Akash Nilugal <[hidden email]>
> > wrote:
> >
> > > Hi chetan,
> > >
> > > As mentioned in design , loading to cache will be an asyc operation,
> and
> > > we will load only the corresponding segment to cache, so there wont be
> any
> > > hit.
> > > Logs will be added
> > >
> > > On 2019/08/21 13:18:05, chetan bhat <[hidden email]> wrote:
> > > > Hi Akash,
> > > >
> > > > 1. Will the performance of end to end dataload operation be impacted
> if
> > > the segment datamap is loaded to cache once the load is finished.
> > > > 2. Will there be a notification in logs stating that the loading of
> > > datamap cache is completed.
> > > >
> > > > Regards
> > > >
> > > > On 2019/08/15 12:03:09, Akash Nilugal <[hidden email]>
> wrote:
> > > > > Hi Community,
> > > > >
> > > > > Currently, we have an index server which basically helps in
> distributed
> > > > > caching of the datamaps in a separate spark application.
> > > > >
> > > > > The caching of the datamaps in index server will start once the
> query
> > > is
> > > > > fired on the table for the first time, all the datamaps will be
> loaded
> > > > >
> > > > > if the count(*) is fired and only required will be loaded for any
> > > filter
> > > > > query.
> > > > >
> > > > >
> > > > > Here the problem or the bottleneck is, until and unless the query
> is
> > > fired
> > > > > on table, the caching won’t be done for the table datamaps.
> > > > >
> > > > > So consider a scenario where we are just loading the data to table
> for
> > > > > whole day and then next day we query,
> > > > >
> > > > > so all the segments will start loading into cache. So first time
> the
> > > query
> > > > > will be slow.
> > > > >
> > > > >
> > > > > What if we load the datamaps into cache or preprime the cache
> without
> > > > > waititng for any query on the table?
> > > > >
> > > > > Yes, what if we load the cache after every load is done, what if we
> > > load
> > > > > the cache for all the segments at once,
> > > > >
> > > > > so that first time query need not do all this job, which makes it
> > > faster.
> > > > >
> > > > >
> > > > > Here i have attached the design document for the pre-priming of
> cache
> > > into
> > > > > index server. Please have a look at it
> > > > >
> > > > > and any suggestions or inputs on this are most welcomed.
> > > > >
> > > > >
> > > > >
> > >
> https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing
> > > > >
> > > > >
> > > > >
> > > > > Regards,
> > > > >
> > > > > Akash R Nilugal
> > > > >
> > > >
> > >
> >
>

--
Thanks & Regards,
Ravi

David Cai

Re: [DISCUSSION] Cache Pre Priming

In reply to this post by akashnilugal@gmail.com

+1 for me

When the table takes care of the performance of the first query, this feature will help to improve it.

For other tables which don't take care of it, maybe no need to pre-cache index.

So it maybe has a table-level property to enable Cache Pre Priming.

Regards
David QiangCai

akashnilugal@gmail.com

Re: [DISCUSSION] Cache Pre Priming

Hi David,

Thanks for the input.

Here anyway at one point of time query is gonna happen on table, if we giev one more table property, simply it will be like complex, like handle property, compatibility, set and unset support. let's not make this more cumbersome. Anyway LRU will take care to evict and load if no space, and since this is async, there won't be any problem.

Regards,
Akash R Nilugal

On 2019/08/26 09:23:46, David Cai <[hidden email]> wrote:

> +1 for me
>
> When the table takes care of the performance of the first query, this feature will help to improve it.
>
> For other tables which don't take care of it, maybe no need to pre-cache index.
>
> So it maybe has a table-level property to enable Cache Pre Priming.
>
>
> Regards
> David QiangCai
>

J 12323123

Unsubscribeme

In reply to this post by akashnilugal@gmail.com

On Fri, 16 Aug 2019, 00:03 Akash Nilugal, <[hidden email]> wrote:

> Hi Community,
>
> Currently, we have an index server which basically helps in distributed
> caching of the datamaps in a separate spark application.
>
> The caching of the datamaps in index server will start once the query is
> fired on the table for the first time, all the datamaps will be loaded
>
> if the count(*) is fired and only required will be loaded for any filter
> query.
>
>
> Here the problem or the bottleneck is, until and unless the query is fired
> on table, the caching won’t be done for the table datamaps.
>
> So consider a scenario where we are just loading the data to table for
> whole day and then next day we query,
>
> so all the segments will start loading into cache. So first time the query
> will be slow.
>
>
> What if we load the datamaps into cache or preprime the cache without
> waititng for any query on the table?
>
> Yes, what if we load the cache after every load is done, what if we load
> the cache for all the segments at once,
>
> so that first time query need not do all this job, which makes it faster.
>
>
> Here i have attached the design document for the pre-priming of cache into
> index server. Please have a look at it
>
> and any suggestions or inputs on this are most welcomed.
>
>
>
> https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing
>
>
>
> Regards,
>
> Akash R Nilugal
>

vikramahuja1001

Re: [DISCUSSION] Cache Pre Priming

In reply to this post by akashnilugal@gmail.com

Hi Community!
The support for prepriming in the case of Bloom and Lucene have to be
removed from the design document as those datamaps are only created during
query time and no the load time. Since they are not created during the load
time, they cannot be preprimed. PFA the updated design document.

https://docs.google.com/document/d/1OXaOnofshqdT-qItU9AYtdcMroFO3UvXDybF0fXi-T4/edit?usp=sharing
<https://docs.google.com/document/d/1OXaOnofshqdT-qItU9AYtdcMroFO3UvXDybF0fXi-T4/edit?usp=sharing>

Vikram Ahuja

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

kumarvishal09

Re: [DISCUSSION] Cache Pre Priming

+1
Regards
Kumar Vishal

On Tue, Nov 26, 2019 at 7:02 PM vikramahuja1001 <[hidden email]>
wrote:

> Hi Community!
> The support for prepriming in the case of Bloom and Lucene have to be
> removed from the design document as those datamaps are only created during
> query time and no the load time. Since they are not created during the load
> time, they cannot be preprimed. PFA the updated design document.
>
>
>
> https://docs.google.com/document/d/1OXaOnofshqdT-qItU9AYtdcMroFO3UvXDybF0fXi-T4/edit?usp=sharing
> <
> https://docs.google.com/document/d/1OXaOnofshqdT-qItU9AYtdcMroFO3UvXDybF0fXi-T4/edit?usp=sharing>
>
>
>
>
> Vikram Ahuja
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>

kumar vishal

akashnilugal@gmail.com

Re: [DISCUSSION] Cache Pre Priming

In reply to this post by vikramahuja1001

+1

Regards,
Akash R Nilugal

On 2019/11/26 13:47:11, vikramahuja1001 <[hidden email]> wrote:

> Hi Community!
> The support for prepriming in the case of Bloom and Lucene have to be
> removed from the design document as those datamaps are only created during
> query time and no the load time. Since they are not created during the load
> time, they cannot be preprimed. PFA the updated design document.
>
>
> https://docs.google.com/document/d/1OXaOnofshqdT-qItU9AYtdcMroFO3UvXDybF0fXi-T4/edit?usp=sharing
> <https://docs.google.com/document/d/1OXaOnofshqdT-qItU9AYtdcMroFO3UvXDybF0fXi-T4/edit?usp=sharing>
>
>
>
> Vikram Ahuja
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>

Ajantha Bhat

Re: [DISCUSSION] Cache Pre Priming

+1

Thanks,
Ajantha

On Fri, 6 Dec, 2019, 8:38 pm Akash Nilugal, <[hidden email]> wrote:

> +1
>
> Regards,
> Akash R Nilugal
>
> On 2019/11/26 13:47:11, vikramahuja1001 <[hidden email]>
> wrote:
> > Hi Community!
> > The support for prepriming in the case of Bloom and Lucene have to be
> > removed from the design document as those datamaps are only created
> during
> > query time and no the load time. Since they are not created during the
> load
> > time, they cannot be preprimed. PFA the updated design document.
> >
> >
> >
> https://docs.google.com/document/d/1OXaOnofshqdT-qItU9AYtdcMroFO3UvXDybF0fXi-T4/edit?usp=sharing
> > <
> https://docs.google.com/document/d/1OXaOnofshqdT-qItU9AYtdcMroFO3UvXDybF0fXi-T4/edit?usp=sharing>
>
> >
> >
> >
> > Vikram Ahuja
> >
> >
> >
> > --
> > Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
>