Hi Akash,
I think better to PrePrime only after each load(Async). As mentioned in design doc, when index servers is started, if the table or db is configured, until and unless all the configured things are loaded into cache, Index server won't be available for query. So query cannot get benefit of pre-prime untill all the metadata is loaded to cache. In order to avoid this user can run count(*) after startup to pre-prime only required tables. Any extra ddl is not required as count(*) can be used as a DDL to load the cache. -Regards Kumar Vishal On Wed, Aug 21, 2019 at 6:57 PM Akash Nilugal <[hidden email]> wrote: > Hi chetan, > > As mentioned in design , loading to cache will be an asyc operation, and > we will load only the corresponding segment to cache, so there wont be any > hit. > Logs will be added > > On 2019/08/21 13:18:05, chetan bhat <[hidden email]> wrote: > > Hi Akash, > > > > 1. Will the performance of end to end dataload operation be impacted if > the segment datamap is loaded to cache once the load is finished. > > 2. Will there be a notification in logs stating that the loading of > datamap cache is completed. > > > > Regards > > > > On 2019/08/15 12:03:09, Akash Nilugal <[hidden email]> wrote: > > > Hi Community, > > > > > > Currently, we have an index server which basically helps in distributed > > > caching of the datamaps in a separate spark application. > > > > > > The caching of the datamaps in index server will start once the query > is > > > fired on the table for the first time, all the datamaps will be loaded > > > > > > if the count(*) is fired and only required will be loaded for any > filter > > > query. > > > > > > > > > Here the problem or the bottleneck is, until and unless the query is > fired > > > on table, the caching won’t be done for the table datamaps. > > > > > > So consider a scenario where we are just loading the data to table for > > > whole day and then next day we query, > > > > > > so all the segments will start loading into cache. So first time the > query > > > will be slow. > > > > > > > > > What if we load the datamaps into cache or preprime the cache without > > > waititng for any query on the table? > > > > > > Yes, what if we load the cache after every load is done, what if we > load > > > the cache for all the segments at once, > > > > > > so that first time query need not do all this job, which makes it > faster. > > > > > > > > > Here i have attached the design document for the pre-priming of cache > into > > > index server. Please have a look at it > > > > > > and any suggestions or inputs on this are most welcomed. > > > > > > > > > > https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing > > > > > > > > > > > > Regards, > > > > > > Akash R Nilugal > > > > > >
kumar vishal
|
Hi vishal,
Your point is correct, we can focus on just loading to cache after data load is finished (Async Operation). for DDL support, count(*) can be used for all the required tables to load into cache. Regards, Akash On 2019/08/22 14:44:34, Kumar Vishal <[hidden email]> wrote: > Hi Akash, > > I think better to PrePrime only after each load(Async). > > As mentioned in design doc, when index servers is started, if the table or > db is configured, until and unless all the configured things are loaded > into cache, Index server won't be available for query. So query cannot get > benefit of pre-prime untill all the metadata is loaded to cache. > In order to avoid this user can run count(*) after startup to pre-prime > only required tables. Any extra ddl is not required as count(*) can be used > as a DDL to load the cache. > > -Regards > Kumar Vishal > > On Wed, Aug 21, 2019 at 6:57 PM Akash Nilugal <[hidden email]> > wrote: > > > Hi chetan, > > > > As mentioned in design , loading to cache will be an asyc operation, and > > we will load only the corresponding segment to cache, so there wont be any > > hit. > > Logs will be added > > > > On 2019/08/21 13:18:05, chetan bhat <[hidden email]> wrote: > > > Hi Akash, > > > > > > 1. Will the performance of end to end dataload operation be impacted if > > the segment datamap is loaded to cache once the load is finished. > > > 2. Will there be a notification in logs stating that the loading of > > datamap cache is completed. > > > > > > Regards > > > > > > On 2019/08/15 12:03:09, Akash Nilugal <[hidden email]> wrote: > > > > Hi Community, > > > > > > > > Currently, we have an index server which basically helps in distributed > > > > caching of the datamaps in a separate spark application. > > > > > > > > The caching of the datamaps in index server will start once the query > > is > > > > fired on the table for the first time, all the datamaps will be loaded > > > > > > > > if the count(*) is fired and only required will be loaded for any > > filter > > > > query. > > > > > > > > > > > > Here the problem or the bottleneck is, until and unless the query is > > fired > > > > on table, the caching won’t be done for the table datamaps. > > > > > > > > So consider a scenario where we are just loading the data to table for > > > > whole day and then next day we query, > > > > > > > > so all the segments will start loading into cache. So first time the > > query > > > > will be slow. > > > > > > > > > > > > What if we load the datamaps into cache or preprime the cache without > > > > waititng for any query on the table? > > > > > > > > Yes, what if we load the cache after every load is done, what if we > > load > > > > the cache for all the segments at once, > > > > > > > > so that first time query need not do all this job, which makes it > > faster. > > > > > > > > > > > > Here i have attached the design document for the pre-priming of cache > > into > > > > index server. Please have a look at it > > > > > > > > and any suggestions or inputs on this are most welcomed. > > > > > > > > > > > > > > https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing > > > > > > > > > > > > > > > > Regards, > > > > > > > > Akash R Nilugal > > > > > > > > > > |
Hi Akash,
+1 for Vishal suggestion.Better focus on load data cache sync. Regards, Ravindra. On Fri, 23 Aug 2019 at 16:35, Akash Nilugal <[hidden email]> wrote: > Hi vishal, > > Your point is correct, we can focus on just loading to cache after data > load is finished (Async Operation). > for DDL support, count(*) can be used for all the required tables to load > into cache. > > Regards, > Akash > > On 2019/08/22 14:44:34, Kumar Vishal <[hidden email]> wrote: > > Hi Akash, > > > > I think better to PrePrime only after each load(Async). > > > > As mentioned in design doc, when index servers is started, if the table > or > > db is configured, until and unless all the configured things are loaded > > into cache, Index server won't be available for query. So query cannot > get > > benefit of pre-prime untill all the metadata is loaded to cache. > > In order to avoid this user can run count(*) after startup to pre-prime > > only required tables. Any extra ddl is not required as count(*) can be > used > > as a DDL to load the cache. > > > > -Regards > > Kumar Vishal > > > > On Wed, Aug 21, 2019 at 6:57 PM Akash Nilugal <[hidden email]> > > wrote: > > > > > Hi chetan, > > > > > > As mentioned in design , loading to cache will be an asyc operation, > and > > > we will load only the corresponding segment to cache, so there wont be > any > > > hit. > > > Logs will be added > > > > > > On 2019/08/21 13:18:05, chetan bhat <[hidden email]> wrote: > > > > Hi Akash, > > > > > > > > 1. Will the performance of end to end dataload operation be impacted > if > > > the segment datamap is loaded to cache once the load is finished. > > > > 2. Will there be a notification in logs stating that the loading of > > > datamap cache is completed. > > > > > > > > Regards > > > > > > > > On 2019/08/15 12:03:09, Akash Nilugal <[hidden email]> > wrote: > > > > > Hi Community, > > > > > > > > > > Currently, we have an index server which basically helps in > distributed > > > > > caching of the datamaps in a separate spark application. > > > > > > > > > > The caching of the datamaps in index server will start once the > query > > > is > > > > > fired on the table for the first time, all the datamaps will be > loaded > > > > > > > > > > if the count(*) is fired and only required will be loaded for any > > > filter > > > > > query. > > > > > > > > > > > > > > > Here the problem or the bottleneck is, until and unless the query > is > > > fired > > > > > on table, the caching won’t be done for the table datamaps. > > > > > > > > > > So consider a scenario where we are just loading the data to table > for > > > > > whole day and then next day we query, > > > > > > > > > > so all the segments will start loading into cache. So first time > the > > > query > > > > > will be slow. > > > > > > > > > > > > > > > What if we load the datamaps into cache or preprime the cache > without > > > > > waititng for any query on the table? > > > > > > > > > > Yes, what if we load the cache after every load is done, what if we > > > load > > > > > the cache for all the segments at once, > > > > > > > > > > so that first time query need not do all this job, which makes it > > > faster. > > > > > > > > > > > > > > > Here i have attached the design document for the pre-priming of > cache > > > into > > > > > index server. Please have a look at it > > > > > > > > > > and any suggestions or inputs on this are most welcomed. > > > > > > > > > > > > > > > > > > > https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > Akash R Nilugal > > > > > > > > > > > > > > > -- Thanks & Regards, Ravi |
In reply to this post by akashnilugal@gmail.com
+1 for me
When the table takes care of the performance of the first query, this feature will help to improve it. For other tables which don't take care of it, maybe no need to pre-cache index. So it maybe has a table-level property to enable Cache Pre Priming. Regards David QiangCai |
Hi David,
Thanks for the input. Here anyway at one point of time query is gonna happen on table, if we giev one more table property, simply it will be like complex, like handle property, compatibility, set and unset support. let's not make this more cumbersome. Anyway LRU will take care to evict and load if no space, and since this is async, there won't be any problem. Regards, Akash R Nilugal On 2019/08/26 09:23:46, David Cai <[hidden email]> wrote: > +1 for me > > When the table takes care of the performance of the first query, this feature will help to improve it. > > For other tables which don't take care of it, maybe no need to pre-cache index. > > So it maybe has a table-level property to enable Cache Pre Priming. > > > Regards > David QiangCai > |
In reply to this post by akashnilugal@gmail.com
On Fri, 16 Aug 2019, 00:03 Akash Nilugal, <[hidden email]> wrote:
> Hi Community, > > Currently, we have an index server which basically helps in distributed > caching of the datamaps in a separate spark application. > > The caching of the datamaps in index server will start once the query is > fired on the table for the first time, all the datamaps will be loaded > > if the count(*) is fired and only required will be loaded for any filter > query. > > > Here the problem or the bottleneck is, until and unless the query is fired > on table, the caching won’t be done for the table datamaps. > > So consider a scenario where we are just loading the data to table for > whole day and then next day we query, > > so all the segments will start loading into cache. So first time the query > will be slow. > > > What if we load the datamaps into cache or preprime the cache without > waititng for any query on the table? > > Yes, what if we load the cache after every load is done, what if we load > the cache for all the segments at once, > > so that first time query need not do all this job, which makes it faster. > > > Here i have attached the design document for the pre-priming of cache into > index server. Please have a look at it > > and any suggestions or inputs on this are most welcomed. > > > > https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing > > > > Regards, > > Akash R Nilugal > |
In reply to this post by akashnilugal@gmail.com
Hi Community!
The support for prepriming in the case of Bloom and Lucene have to be removed from the design document as those datamaps are only created during query time and no the load time. Since they are not created during the load time, they cannot be preprimed. PFA the updated design document. https://docs.google.com/document/d/1OXaOnofshqdT-qItU9AYtdcMroFO3UvXDybF0fXi-T4/edit?usp=sharing <https://docs.google.com/document/d/1OXaOnofshqdT-qItU9AYtdcMroFO3UvXDybF0fXi-T4/edit?usp=sharing> Vikram Ahuja -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
+1
Regards Kumar Vishal On Tue, Nov 26, 2019 at 7:02 PM vikramahuja1001 <[hidden email]> wrote: > Hi Community! > The support for prepriming in the case of Bloom and Lucene have to be > removed from the design document as those datamaps are only created during > query time and no the load time. Since they are not created during the load > time, they cannot be preprimed. PFA the updated design document. > > > > https://docs.google.com/document/d/1OXaOnofshqdT-qItU9AYtdcMroFO3UvXDybF0fXi-T4/edit?usp=sharing > < > https://docs.google.com/document/d/1OXaOnofshqdT-qItU9AYtdcMroFO3UvXDybF0fXi-T4/edit?usp=sharing> > > > > > Vikram Ahuja > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ >
kumar vishal
|
In reply to this post by vikramahuja1001
+1
Regards, Akash R Nilugal On 2019/11/26 13:47:11, vikramahuja1001 <[hidden email]> wrote: > Hi Community! > The support for prepriming in the case of Bloom and Lucene have to be > removed from the design document as those datamaps are only created during > query time and no the load time. Since they are not created during the load > time, they cannot be preprimed. PFA the updated design document. > > > https://docs.google.com/document/d/1OXaOnofshqdT-qItU9AYtdcMroFO3UvXDybF0fXi-T4/edit?usp=sharing > <https://docs.google.com/document/d/1OXaOnofshqdT-qItU9AYtdcMroFO3UvXDybF0fXi-T4/edit?usp=sharing> > > > > Vikram Ahuja > > > > -- > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
+1
Thanks, Ajantha On Fri, 6 Dec, 2019, 8:38 pm Akash Nilugal, <[hidden email]> wrote: > +1 > > Regards, > Akash R Nilugal > > On 2019/11/26 13:47:11, vikramahuja1001 <[hidden email]> > wrote: > > Hi Community! > > The support for prepriming in the case of Bloom and Lucene have to be > > removed from the design document as those datamaps are only created > during > > query time and no the load time. Since they are not created during the > load > > time, they cannot be preprimed. PFA the updated design document. > > > > > > > https://docs.google.com/document/d/1OXaOnofshqdT-qItU9AYtdcMroFO3UvXDybF0fXi-T4/edit?usp=sharing > > < > https://docs.google.com/document/d/1OXaOnofshqdT-qItU9AYtdcMroFO3UvXDybF0fXi-T4/edit?usp=sharing> > > > > > > > > > Vikram Ahuja > > > > > > > > -- > > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > > > |
Free forum by Nabble | Edit this page |