Hi,
*1. Current problem.* 1.There is more size taking on java heap to create Btree for index file. It is because we create multiple objects for each leaf node so it takes more memory inside heap than actual size of index file. while doing LRU cache also we are considering only index file size instead of objects size so it impacts the eviction process of LRU cache. 2. Currently we load one btree on driver side to find the blocks and load another btree on executor side to find the blocklets. After we have increased the blocklet size to 128 mb and decrease the table_block size to 256 mb the number of nodes inside driver side btree and executor side btree is not much different. So it would be overhead to read the same information twice. And also chances of loading btree on each executor is more for every query because there is no guarantee that same block goes to same executor every time. It will be worse in case of dynamic containers. *2. Proposed solution.* 1. To reduce the java heap for Btree , we can remove the Btree data structure and use simple single array and do binary search on it. And also we should move this cached arrays to unsafe (offheap/onheap) to reduce the burden on GC. 2. Unify the btree to single Btree instead of 2 and load at driver side. So that only one lookup can be done to find the blocklets directly. And executors are not required to load the btree for every query. We can consider moving this to separate metadata service eventually once the memory footprint get reduced. First I will consider point 1 reduce the btree size after that I consider merging of btrees. Please comment on it. -- Thanks & Regards, Ravindra. |
Administrator
|
Hi Ravi
Thank you bringing this improvement discussion to mailing list. One question , the point1 how to solve the below issues ? there are still two part index info in driver and executor side ? ---------------------------------------------------------------------------------------------------- And also chances of loading btree on each executor is more for every query because there is no guarantee that same block goes to same executor every time. It will be worse in case of dynamic containers. Regards Liang 2017-05-17 7:33 GMT-04:00 Ravindra Pesala <[hidden email]>: > Hi, > > *1. Current problem.* > 1.There is more size taking on java heap to create Btree for index file. > It is because we create multiple objects for each leaf node so it takes > more memory inside heap than actual size of index file. while doing LRU > cache also we are considering only index file size instead of objects size > so it impacts the eviction process of LRU cache. > 2. Currently we load one btree on driver side to find the blocks and load > another btree on executor side to find the blocklets. After we have > increased the blocklet size to 128 mb and decrease the table_block size to > 256 mb the number of nodes inside driver side btree and executor side btree > is not much different. So it would be overhead to read the same information > twice. > And also chances of loading btree on each executor is more for every query > because there is no guarantee that same block goes to same executor every > time. It will be worse in case of dynamic containers. > > *2. Proposed solution.* > 1. To reduce the java heap for Btree , we can remove the Btree data > structure and use simple single array and do binary search on it. And also > we should move this cached arrays to unsafe (offheap/onheap) to reduce the > burden on GC. > 2. Unify the btree to single Btree instead of 2 and load at driver side. > So that only one lookup can be done to find the blocklets directly. And > executors are not required to load the btree for every query. > We can consider moving this to separate metadata service eventually > once the memory footprint get reduced. > > First I will consider point 1 reduce the btree size after that I consider > merging of btrees. > > Please comment on it. > > -- > Thanks & Regards, > Ravindra. > -- Regards Liang |
In reply to this post by ravipesala
+1 for both proposal 1 & 2,
For point 2, do you have idea how many blocklet within one block roughly? This will help to estimate the length of array in driver side. Regards, Jacky > 在 2017年5月17日,下午7:33,Ravindra Pesala <[hidden email]> 写道: > > Hi, > > *1. Current problem.* > 1.There is more size taking on java heap to create Btree for index file. > It is because we create multiple objects for each leaf node so it takes > more memory inside heap than actual size of index file. while doing LRU > cache also we are considering only index file size instead of objects size > so it impacts the eviction process of LRU cache. > 2. Currently we load one btree on driver side to find the blocks and load > another btree on executor side to find the blocklets. After we have > increased the blocklet size to 128 mb and decrease the table_block size to > 256 mb the number of nodes inside driver side btree and executor side btree > is not much different. So it would be overhead to read the same information > twice. > And also chances of loading btree on each executor is more for every query > because there is no guarantee that same block goes to same executor every > time. It will be worse in case of dynamic containers. > > *2. Proposed solution.* > 1. To reduce the java heap for Btree , we can remove the Btree data > structure and use simple single array and do binary search on it. And also > we should move this cached arrays to unsafe (offheap/onheap) to reduce the > burden on GC. > 2. Unify the btree to single Btree instead of 2 and load at driver side. > So that only one lookup can be done to find the blocklets directly. And > executors are not required to load the btree for every query. > We can consider moving this to separate metadata service eventually > once the memory footprint get reduced. > > First I will consider point 1 reduce the btree size after that I consider > merging of btrees. > > Please comment on it. > > -- > Thanks & Regards, > Ravindra. |
Hi Jacky,
Default blocklet size currently is 64 MB, so if block size is 256 MB then at most blocklets per block is 4. Regards, Ravindra. On 17 May 2017 at 19:59, Jacky Li <[hidden email]> wrote: > +1 for both proposal 1 & 2, > > For point 2, do you have idea how many blocklet within one block roughly? > This will help to estimate the length of array in driver side. > > Regards, > Jacky > > > 在 2017年5月17日,下午7:33,Ravindra Pesala <[hidden email]> 写道: > > > > Hi, > > > > *1. Current problem.* > > 1.There is more size taking on java heap to create Btree for index file. > > It is because we create multiple objects for each leaf node so it takes > > more memory inside heap than actual size of index file. while doing LRU > > cache also we are considering only index file size instead of objects > size > > so it impacts the eviction process of LRU cache. > > 2. Currently we load one btree on driver side to find the blocks and load > > another btree on executor side to find the blocklets. After we have > > increased the blocklet size to 128 mb and decrease the table_block size > to > > 256 mb the number of nodes inside driver side btree and executor side > btree > > is not much different. So it would be overhead to read the same > information > > twice. > > And also chances of loading btree on each executor is more for every > query > > because there is no guarantee that same block goes to same executor every > > time. It will be worse in case of dynamic containers. > > > > *2. Proposed solution.* > > 1. To reduce the java heap for Btree , we can remove the Btree data > > structure and use simple single array and do binary search on it. And > also > > we should move this cached arrays to unsafe (offheap/onheap) to reduce > the > > burden on GC. > > 2. Unify the btree to single Btree instead of 2 and load at driver side. > > So that only one lookup can be done to find the blocklets directly. And > > executors are not required to load the btree for every query. > > We can consider moving this to separate metadata service eventually > > once the memory footprint get reduced. > > > > First I will consider point 1 reduce the btree size after that I consider > > merging of btrees. > > > > Please comment on it. > > > > -- > > Thanks & Regards, > > Ravindra. > > > > -- Thanks & Regards, Ravi |
In reply to this post by Liang Chen
Hi Liang,
Yes Liang , it will be done in 2 parts. At first reduce the size of the btree and then merge the driver side and executor btree to single btree. Regards, Ravindra. On 17 May 2017 at 19:28, Liang Chen <[hidden email]> wrote: > Hi Ravi > > Thank you bringing this improvement discussion to mailing list. > > One question , the point1 how to solve the below issues ? there are still > two part index info in driver and executor side ? > ------------------------------------------------------------ > ---------------------------------------- > And also chances of loading btree on each executor is more for every query > because there is no guarantee that same block goes to same executor every > time. It will be worse in case of dynamic containers. > > Regards > Liang > > 2017-05-17 7:33 GMT-04:00 Ravindra Pesala <[hidden email]>: > > > Hi, > > > > *1. Current problem.* > > 1.There is more size taking on java heap to create Btree for index file. > > It is because we create multiple objects for each leaf node so it takes > > more memory inside heap than actual size of index file. while doing LRU > > cache also we are considering only index file size instead of objects > size > > so it impacts the eviction process of LRU cache. > > 2. Currently we load one btree on driver side to find the blocks and > load > > another btree on executor side to find the blocklets. After we have > > increased the blocklet size to 128 mb and decrease the table_block size > to > > 256 mb the number of nodes inside driver side btree and executor side > btree > > is not much different. So it would be overhead to read the same > information > > twice. > > And also chances of loading btree on each executor is more for every > query > > because there is no guarantee that same block goes to same executor every > > time. It will be worse in case of dynamic containers. > > > > *2. Proposed solution.* > > 1. To reduce the java heap for Btree , we can remove the Btree data > > structure and use simple single array and do binary search on it. And > also > > we should move this cached arrays to unsafe (offheap/onheap) to reduce > the > > burden on GC. > > 2. Unify the btree to single Btree instead of 2 and load at driver side. > > So that only one lookup can be done to find the blocklets directly. And > > executors are not required to load the btree for every query. > > We can consider moving this to separate metadata service eventually > > once the memory footprint get reduced. > > > > First I will consider point 1 reduce the btree size after that I consider > > merging of btrees. > > > > Please comment on it. > > > > -- > > Thanks & Regards, > > Ravindra. > > > > > > -- > Regards > Liang > -- Thanks & Regards, Ravi |
Hi Ravi,
I have few queries related to both the solution. 1. To reduce the java heap for Btree , we can remove the Btree data structure and use simple single array and do binary search on it. And also we should move this cached arrays to unsafe (offheap/onheap) to reduce the burden on GC. Single array of object?? If user is loading small data in each load so we will save only intermediate node of btree by maintaining in array. Yes by maintaining all the metadata(startkey, min,max) in offheap will reduce memory footprint. Is it possible to maintain data in single byte array and maintaining the offset of each key for each column?? 2. Unify the btree to single Btree instead of 2 and load at driver side. So that only one lookup can be done to find the blocklets directly. And executors are not required to load the btree for every query. If we unify the btree to single btree, then it will be loaded in driver side, so driver will need more memory to maintain this metadata and for each query driver needs to do blocklet level pruning, in case of concurrent query driver may be overloaded. And how we will send this metadata information (offset of each column in file and other metadata) to executor, serializing and deserializing this metadata cost also we need to consider. Please correct me if my comments are not valid. -Regards Kumar Vishal On Thu, May 18, 2017 at 12:33 PM, Ravindra Pesala <[hidden email]> wrote: > Hi Liang, > > Yes Liang , it will be done in 2 parts. At first reduce the size of the > btree and then merge the driver side and executor btree to single btree. > > Regards, > Ravindra. > > On 17 May 2017 at 19:28, Liang Chen <[hidden email]> wrote: > > > Hi Ravi > > > > Thank you bringing this improvement discussion to mailing list. > > > > One question , the point1 how to solve the below issues ? there are still > > two part index info in driver and executor side ? > > ------------------------------------------------------------ > > ---------------------------------------- > > And also chances of loading btree on each executor is more for every > query > > because there is no guarantee that same block goes to same executor every > > time. It will be worse in case of dynamic containers. > > > > Regards > > Liang > > > > 2017-05-17 7:33 GMT-04:00 Ravindra Pesala <[hidden email]>: > > > > > Hi, > > > > > > *1. Current problem.* > > > 1.There is more size taking on java heap to create Btree for index > file. > > > It is because we create multiple objects for each leaf node so it takes > > > more memory inside heap than actual size of index file. while doing LRU > > > cache also we are considering only index file size instead of objects > > size > > > so it impacts the eviction process of LRU cache. > > > 2. Currently we load one btree on driver side to find the blocks and > > load > > > another btree on executor side to find the blocklets. After we have > > > increased the blocklet size to 128 mb and decrease the table_block size > > to > > > 256 mb the number of nodes inside driver side btree and executor side > > btree > > > is not much different. So it would be overhead to read the same > > information > > > twice. > > > And also chances of loading btree on each executor is more for every > > query > > > because there is no guarantee that same block goes to same executor > every > > > time. It will be worse in case of dynamic containers. > > > > > > *2. Proposed solution.* > > > 1. To reduce the java heap for Btree , we can remove the Btree data > > > structure and use simple single array and do binary search on it. And > > also > > > we should move this cached arrays to unsafe (offheap/onheap) to reduce > > the > > > burden on GC. > > > 2. Unify the btree to single Btree instead of 2 and load at driver > side. > > > So that only one lookup can be done to find the blocklets directly. And > > > executors are not required to load the btree for every query. > > > We can consider moving this to separate metadata service eventually > > > once the memory footprint get reduced. > > > > > > First I will consider point 1 reduce the btree size after that I > consider > > > merging of btrees. > > > > > > Please comment on it. > > > > > > -- > > > Thanks & Regards, > > > Ravindra. > > > > > > > > > > > -- > > Regards > > Liang > > > > > > -- > Thanks & Regards, > Ravi >
kumar vishal
|
> 在 2017年5月18日,下午3:31,Kumar Vishal <[hidden email]> 写道: > > Hi Ravi, > > I have few queries related to both the solution. > > 1. To reduce the java heap for Btree , we can remove the Btree data > structure and use simple single array and do binary search on it. And also > we should move this cached arrays to unsafe (offheap/onheap) to reduce the > burden on GC. > I think now is the right time to abstract driver side index into an interface, which can have multiple implementation: onheap, offheap, external service, etc. > Single array of object?? If user is loading small data in each load so we > will save only intermediate node of btree by maintaining in array. Yes by > maintaining all the metadata(startkey, min,max) in offheap will reduce > memory footprint. Is it possible to maintain data in single byte array and > maintaining the offset of each key for each column?? > Do you mean put to all indexes for all segments in a single array? In that case, we can compare in-memory sequential scan and binary search which introduce random walk on memory. I feel sequential scan may beat random walk if the number of element is not many. > 2. Unify the btree to single Btree instead of 2 and load at driver side. > So that only one lookup can be done to find the blocklets directly. And > executors are not required to load the btree for every query. > > If we unify the btree to single btree, then it will be loaded in driver > side, so driver will need more memory to maintain this metadata and for > each query driver needs to do blocklet level pruning, in case of concurrent > query driver may be overloaded. And how we will send this metadata > information (offset of each column in file and other metadata) to executor, > serializing and deserializing this metadata cost also we need to consider. > Assuming there is only 4~8 blocklet in block, I think for a block, we can sent blockletId using a byte array. Should be ok, right? > Please correct me if my comments are not valid. > > -Regards > Kumar Vishal > > > On Thu, May 18, 2017 at 12:33 PM, Ravindra Pesala <[hidden email]> > wrote: > >> Hi Liang, >> >> Yes Liang , it will be done in 2 parts. At first reduce the size of the >> btree and then merge the driver side and executor btree to single btree. >> >> Regards, >> Ravindra. >> >> On 17 May 2017 at 19:28, Liang Chen <[hidden email]> wrote: >> >>> Hi Ravi >>> >>> Thank you bringing this improvement discussion to mailing list. >>> >>> One question , the point1 how to solve the below issues ? there are still >>> two part index info in driver and executor side ? >>> ------------------------------------------------------------ >>> ---------------------------------------- >>> And also chances of loading btree on each executor is more for every >> query >>> because there is no guarantee that same block goes to same executor every >>> time. It will be worse in case of dynamic containers. >>> >>> Regards >>> Liang >>> >>> 2017-05-17 7:33 GMT-04:00 Ravindra Pesala <[hidden email]>: >>> >>>> Hi, >>>> >>>> *1. Current problem.* >>>> 1.There is more size taking on java heap to create Btree for index >> file. >>>> It is because we create multiple objects for each leaf node so it takes >>>> more memory inside heap than actual size of index file. while doing LRU >>>> cache also we are considering only index file size instead of objects >>> size >>>> so it impacts the eviction process of LRU cache. >>>> 2. Currently we load one btree on driver side to find the blocks and >>> load >>>> another btree on executor side to find the blocklets. After we have >>>> increased the blocklet size to 128 mb and decrease the table_block size >>> to >>>> 256 mb the number of nodes inside driver side btree and executor side >>> btree >>>> is not much different. So it would be overhead to read the same >>> information >>>> twice. >>>> And also chances of loading btree on each executor is more for every >>> query >>>> because there is no guarantee that same block goes to same executor >> every >>>> time. It will be worse in case of dynamic containers. >>>> >>>> *2. Proposed solution.* >>>> 1. To reduce the java heap for Btree , we can remove the Btree data >>>> structure and use simple single array and do binary search on it. And >>> also >>>> we should move this cached arrays to unsafe (offheap/onheap) to reduce >>> the >>>> burden on GC. >>>> 2. Unify the btree to single Btree instead of 2 and load at driver >> side. >>>> So that only one lookup can be done to find the blocklets directly. And >>>> executors are not required to load the btree for every query. >>>> We can consider moving this to separate metadata service eventually >>>> once the memory footprint get reduced. >>>> >>>> First I will consider point 1 reduce the btree size after that I >> consider >>>> merging of btrees. >>>> >>>> Please comment on it. >>>> >>>> -- >>>> Thanks & Regards, >>>> Ravindra. >>>> >>> >>> >>> >>> -- >>> Regards >>> Liang >>> >> >> >> >> -- >> Thanks & Regards, >> Ravi >> |
Hi Vishal & Jacky,
Yes, we need to extract interfaces for storing index in memory like onheap/offheap. And same interface can be implemented to store/retrieve index from external service like DB. Not all segments will be kept in single tree here, we try to keep one index file to one array. Here one segment may contain multiple indexes so we will have multiple arrays per segment. Yes, the ratio of blocklets inside blocks are small, so for each task we will send the information of blocklets which needs to scanned inside block so the serialization cost is very minimum. Regards, Ravindra On 19 May 2017 at 08:17, Jacky Li <[hidden email]> wrote: > > > 在 2017年5月18日,下午3:31,Kumar Vishal <[hidden email]> 写道: > > > > Hi Ravi, > > > > I have few queries related to both the solution. > > > > 1. To reduce the java heap for Btree , we can remove the Btree data > > structure and use simple single array and do binary search on it. And > also > > we should move this cached arrays to unsafe (offheap/onheap) to reduce > the > > burden on GC. > > > > I think now is the right time to abstract driver side index into an > interface, which can have multiple implementation: onheap, offheap, > external service, etc. > > > Single array of object?? If user is loading small data in each load so we > > will save only intermediate node of btree by maintaining in array. Yes by > > maintaining all the metadata(startkey, min,max) in offheap will reduce > > memory footprint. Is it possible to maintain data in single byte array > and > > maintaining the offset of each key for each column?? > > > > Do you mean put to all indexes for all segments in a single array? In that > case, we can compare in-memory sequential scan and binary search which > introduce random walk on memory. I feel sequential scan may beat random > walk if the number of element is not many. > > > 2. Unify the btree to single Btree instead of 2 and load at driver side. > > So that only one lookup can be done to find the blocklets directly. And > > executors are not required to load the btree for every query. > > > > If we unify the btree to single btree, then it will be loaded in driver > > side, so driver will need more memory to maintain this metadata and for > > each query driver needs to do blocklet level pruning, in case of > concurrent > > query driver may be overloaded. And how we will send this metadata > > information (offset of each column in file and other metadata) to > executor, > > serializing and deserializing this metadata cost also we need to > consider. > > > > Assuming there is only 4~8 blocklet in block, I think for a block, we can > sent blockletId using a byte array. Should be ok, right? > > > Please correct me if my comments are not valid. > > > > -Regards > > Kumar Vishal > > > > > > On Thu, May 18, 2017 at 12:33 PM, Ravindra Pesala <[hidden email] > > > > wrote: > > > >> Hi Liang, > >> > >> Yes Liang , it will be done in 2 parts. At first reduce the size of the > >> btree and then merge the driver side and executor btree to single btree. > >> > >> Regards, > >> Ravindra. > >> > >> On 17 May 2017 at 19:28, Liang Chen <[hidden email]> wrote: > >> > >>> Hi Ravi > >>> > >>> Thank you bringing this improvement discussion to mailing list. > >>> > >>> One question , the point1 how to solve the below issues ? there are > still > >>> two part index info in driver and executor side ? > >>> ------------------------------------------------------------ > >>> ---------------------------------------- > >>> And also chances of loading btree on each executor is more for every > >> query > >>> because there is no guarantee that same block goes to same executor > every > >>> time. It will be worse in case of dynamic containers. > >>> > >>> Regards > >>> Liang > >>> > >>> 2017-05-17 7:33 GMT-04:00 Ravindra Pesala <[hidden email]>: > >>> > >>>> Hi, > >>>> > >>>> *1. Current problem.* > >>>> 1.There is more size taking on java heap to create Btree for index > >> file. > >>>> It is because we create multiple objects for each leaf node so it > takes > >>>> more memory inside heap than actual size of index file. while doing > LRU > >>>> cache also we are considering only index file size instead of objects > >>> size > >>>> so it impacts the eviction process of LRU cache. > >>>> 2. Currently we load one btree on driver side to find the blocks and > >>> load > >>>> another btree on executor side to find the blocklets. After we have > >>>> increased the blocklet size to 128 mb and decrease the table_block > size > >>> to > >>>> 256 mb the number of nodes inside driver side btree and executor side > >>> btree > >>>> is not much different. So it would be overhead to read the same > >>> information > >>>> twice. > >>>> And also chances of loading btree on each executor is more for every > >>> query > >>>> because there is no guarantee that same block goes to same executor > >> every > >>>> time. It will be worse in case of dynamic containers. > >>>> > >>>> *2. Proposed solution.* > >>>> 1. To reduce the java heap for Btree , we can remove the Btree data > >>>> structure and use simple single array and do binary search on it. And > >>> also > >>>> we should move this cached arrays to unsafe (offheap/onheap) to reduce > >>> the > >>>> burden on GC. > >>>> 2. Unify the btree to single Btree instead of 2 and load at driver > >> side. > >>>> So that only one lookup can be done to find the blocklets directly. > And > >>>> executors are not required to load the btree for every query. > >>>> We can consider moving this to separate metadata service eventually > >>>> once the memory footprint get reduced. > >>>> > >>>> First I will consider point 1 reduce the btree size after that I > >> consider > >>>> merging of btrees. > >>>> > >>>> Please comment on it. > >>>> > >>>> -- > >>>> Thanks & Regards, > >>>> Ravindra. > >>>> > >>> > >>> > >>> > >>> -- > >>> Regards > >>> Liang > >>> > >> > >> > >> > >> -- > >> Thanks & Regards, > >> Ravi > >> > > > > -- Thanks & Regards, Ravi |
> 在 2017年5月19日,下午8:28,Ravindra Pesala <[hidden email]> 写道: > > Hi Vishal & Jacky, > > Yes, we need to extract interfaces for storing index in memory like > onheap/offheap. And same interface can be implemented to store/retrieve > index from external service like DB. > > Not all segments will be kept in single tree here, we try to keep one index > file to one array. Here one segment may contain multiple indexes so we will > have multiple arrays per segment. One more suggest here, we should make number of index file as less as possible so that improve IO to load the index, like make one index file for one segment only. We can do it by issuing one more mapreduce job after the loading carbondata file. The job will read all footers and reduce to generate one index file. What do you think? > > Yes, the ratio of blocklets inside blocks are small, so for each task we > will send the information of blocklets which needs to scanned inside block > so the serialization cost is very minimum. > > Regards, > Ravindra > > On 19 May 2017 at 08:17, Jacky Li <[hidden email]> wrote: > >> >>> 在 2017年5月18日,下午3:31,Kumar Vishal <[hidden email]> 写道: >>> >>> Hi Ravi, >>> >>> I have few queries related to both the solution. >>> >>> 1. To reduce the java heap for Btree , we can remove the Btree data >>> structure and use simple single array and do binary search on it. And >> also >>> we should move this cached arrays to unsafe (offheap/onheap) to reduce >> the >>> burden on GC. >>> >> >> I think now is the right time to abstract driver side index into an >> interface, which can have multiple implementation: onheap, offheap, >> external service, etc. >> >>> Single array of object?? If user is loading small data in each load so we >>> will save only intermediate node of btree by maintaining in array. Yes by >>> maintaining all the metadata(startkey, min,max) in offheap will reduce >>> memory footprint. Is it possible to maintain data in single byte array >> and >>> maintaining the offset of each key for each column?? >>> >> >> Do you mean put to all indexes for all segments in a single array? In that >> case, we can compare in-memory sequential scan and binary search which >> introduce random walk on memory. I feel sequential scan may beat random >> walk if the number of element is not many. >> >>> 2. Unify the btree to single Btree instead of 2 and load at driver side. >>> So that only one lookup can be done to find the blocklets directly. And >>> executors are not required to load the btree for every query. >>> >>> If we unify the btree to single btree, then it will be loaded in driver >>> side, so driver will need more memory to maintain this metadata and for >>> each query driver needs to do blocklet level pruning, in case of >> concurrent >>> query driver may be overloaded. And how we will send this metadata >>> information (offset of each column in file and other metadata) to >> executor, >>> serializing and deserializing this metadata cost also we need to >> consider. >>> >> >> Assuming there is only 4~8 blocklet in block, I think for a block, we can >> sent blockletId using a byte array. Should be ok, right? >> >>> Please correct me if my comments are not valid. >>> >>> -Regards >>> Kumar Vishal >>> >>> >>> On Thu, May 18, 2017 at 12:33 PM, Ravindra Pesala <[hidden email] >>> >>> wrote: >>> >>>> Hi Liang, >>>> >>>> Yes Liang , it will be done in 2 parts. At first reduce the size of the >>>> btree and then merge the driver side and executor btree to single btree. >>>> >>>> Regards, >>>> Ravindra. >>>> >>>> On 17 May 2017 at 19:28, Liang Chen <[hidden email]> wrote: >>>> >>>>> Hi Ravi >>>>> >>>>> Thank you bringing this improvement discussion to mailing list. >>>>> >>>>> One question , the point1 how to solve the below issues ? there are >> still >>>>> two part index info in driver and executor side ? >>>>> ------------------------------------------------------------ >>>>> ---------------------------------------- >>>>> And also chances of loading btree on each executor is more for every >>>> query >>>>> because there is no guarantee that same block goes to same executor >> every >>>>> time. It will be worse in case of dynamic containers. >>>>> >>>>> Regards >>>>> Liang >>>>> >>>>> 2017-05-17 7:33 GMT-04:00 Ravindra Pesala <[hidden email]>: >>>>> >>>>>> Hi, >>>>>> >>>>>> *1. Current problem.* >>>>>> 1.There is more size taking on java heap to create Btree for index >>>> file. >>>>>> It is because we create multiple objects for each leaf node so it >> takes >>>>>> more memory inside heap than actual size of index file. while doing >> LRU >>>>>> cache also we are considering only index file size instead of objects >>>>> size >>>>>> so it impacts the eviction process of LRU cache. >>>>>> 2. Currently we load one btree on driver side to find the blocks and >>>>> load >>>>>> another btree on executor side to find the blocklets. After we have >>>>>> increased the blocklet size to 128 mb and decrease the table_block >> size >>>>> to >>>>>> 256 mb the number of nodes inside driver side btree and executor side >>>>> btree >>>>>> is not much different. So it would be overhead to read the same >>>>> information >>>>>> twice. >>>>>> And also chances of loading btree on each executor is more for every >>>>> query >>>>>> because there is no guarantee that same block goes to same executor >>>> every >>>>>> time. It will be worse in case of dynamic containers. >>>>>> >>>>>> *2. Proposed solution.* >>>>>> 1. To reduce the java heap for Btree , we can remove the Btree data >>>>>> structure and use simple single array and do binary search on it. And >>>>> also >>>>>> we should move this cached arrays to unsafe (offheap/onheap) to reduce >>>>> the >>>>>> burden on GC. >>>>>> 2. Unify the btree to single Btree instead of 2 and load at driver >>>> side. >>>>>> So that only one lookup can be done to find the blocklets directly. >> And >>>>>> executors are not required to load the btree for every query. >>>>>> We can consider moving this to separate metadata service eventually >>>>>> once the memory footprint get reduced. >>>>>> >>>>>> First I will consider point 1 reduce the btree size after that I >>>> consider >>>>>> merging of btrees. >>>>>> >>>>>> Please comment on it. >>>>>> >>>>>> -- >>>>>> Thanks & Regards, >>>>>> Ravindra. >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Regards >>>>> Liang >>>>> >>>> >>>> >>>> >>>> -- >>>> Thanks & Regards, >>>> Ravi >>>> >> >> >> >> > > > -- > Thanks & Regards, > Ravi > |
Hi Jacky,
Number of index files will be number of node in cluster per segment(if number of node is 10 then 10 index file per segment). For merging all the index file to one we need to add some thrift to handle this as we need to maintain block's footer to task id mapping. -Regards Kumar Vishal On Fri, May 19, 2017 at 7:02 PM, Jacky Li <[hidden email]> wrote: > > > 在 2017年5月19日,下午8:28,Ravindra Pesala <[hidden email]> 写道: > > > > Hi Vishal & Jacky, > > > > Yes, we need to extract interfaces for storing index in memory like > > onheap/offheap. And same interface can be implemented to store/retrieve > > index from external service like DB. > > > > Not all segments will be kept in single tree here, we try to keep one > index > > file to one array. Here one segment may contain multiple indexes so we > will > > have multiple arrays per segment. > > One more suggest here, we should make number of index file as less as > possible so that improve IO to load the index, like make one index file for > one segment only. We can do it by issuing one more mapreduce job after the > loading carbondata file. The job will read all footers and reduce to > generate one index file. What do you think? > > > > > Yes, the ratio of blocklets inside blocks are small, so for each task we > > will send the information of blocklets which needs to scanned inside > block > > so the serialization cost is very minimum. > > > > Regards, > > Ravindra > > > > On 19 May 2017 at 08:17, Jacky Li <[hidden email]> wrote: > > > >> > >>> 在 2017年5月18日,下午3:31,Kumar Vishal <[hidden email]> 写道: > >>> > >>> Hi Ravi, > >>> > >>> I have few queries related to both the solution. > >>> > >>> 1. To reduce the java heap for Btree , we can remove the Btree data > >>> structure and use simple single array and do binary search on it. And > >> also > >>> we should move this cached arrays to unsafe (offheap/onheap) to reduce > >> the > >>> burden on GC. > >>> > >> > >> I think now is the right time to abstract driver side index into an > >> interface, which can have multiple implementation: onheap, offheap, > >> external service, etc. > >> > >>> Single array of object?? If user is loading small data in each load so > we > >>> will save only intermediate node of btree by maintaining in array. Yes > by > >>> maintaining all the metadata(startkey, min,max) in offheap will reduce > >>> memory footprint. Is it possible to maintain data in single byte array > >> and > >>> maintaining the offset of each key for each column?? > >>> > >> > >> Do you mean put to all indexes for all segments in a single array? In > that > >> case, we can compare in-memory sequential scan and binary search which > >> introduce random walk on memory. I feel sequential scan may beat random > >> walk if the number of element is not many. > >> > >>> 2. Unify the btree to single Btree instead of 2 and load at driver > side. > >>> So that only one lookup can be done to find the blocklets directly. And > >>> executors are not required to load the btree for every query. > >>> > >>> If we unify the btree to single btree, then it will be loaded in driver > >>> side, so driver will need more memory to maintain this metadata and for > >>> each query driver needs to do blocklet level pruning, in case of > >> concurrent > >>> query driver may be overloaded. And how we will send this metadata > >>> information (offset of each column in file and other metadata) to > >> executor, > >>> serializing and deserializing this metadata cost also we need to > >> consider. > >>> > >> > >> Assuming there is only 4~8 blocklet in block, I think for a block, we > can > >> sent blockletId using a byte array. Should be ok, right? > >> > >>> Please correct me if my comments are not valid. > >>> > >>> -Regards > >>> Kumar Vishal > >>> > >>> > >>> On Thu, May 18, 2017 at 12:33 PM, Ravindra Pesala < > [hidden email] > >>> > >>> wrote: > >>> > >>>> Hi Liang, > >>>> > >>>> Yes Liang , it will be done in 2 parts. At first reduce the size of > the > >>>> btree and then merge the driver side and executor btree to single > btree. > >>>> > >>>> Regards, > >>>> Ravindra. > >>>> > >>>> On 17 May 2017 at 19:28, Liang Chen <[hidden email]> wrote: > >>>> > >>>>> Hi Ravi > >>>>> > >>>>> Thank you bringing this improvement discussion to mailing list. > >>>>> > >>>>> One question , the point1 how to solve the below issues ? there are > >> still > >>>>> two part index info in driver and executor side ? > >>>>> ------------------------------------------------------------ > >>>>> ---------------------------------------- > >>>>> And also chances of loading btree on each executor is more for every > >>>> query > >>>>> because there is no guarantee that same block goes to same executor > >> every > >>>>> time. It will be worse in case of dynamic containers. > >>>>> > >>>>> Regards > >>>>> Liang > >>>>> > >>>>> 2017-05-17 7:33 GMT-04:00 Ravindra Pesala <[hidden email]>: > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> *1. Current problem.* > >>>>>> 1.There is more size taking on java heap to create Btree for index > >>>> file. > >>>>>> It is because we create multiple objects for each leaf node so it > >> takes > >>>>>> more memory inside heap than actual size of index file. while doing > >> LRU > >>>>>> cache also we are considering only index file size instead of > objects > >>>>> size > >>>>>> so it impacts the eviction process of LRU cache. > >>>>>> 2. Currently we load one btree on driver side to find the blocks and > >>>>> load > >>>>>> another btree on executor side to find the blocklets. After we have > >>>>>> increased the blocklet size to 128 mb and decrease the table_block > >> size > >>>>> to > >>>>>> 256 mb the number of nodes inside driver side btree and executor > side > >>>>> btree > >>>>>> is not much different. So it would be overhead to read the same > >>>>> information > >>>>>> twice. > >>>>>> And also chances of loading btree on each executor is more for every > >>>>> query > >>>>>> because there is no guarantee that same block goes to same executor > >>>> every > >>>>>> time. It will be worse in case of dynamic containers. > >>>>>> > >>>>>> *2. Proposed solution.* > >>>>>> 1. To reduce the java heap for Btree , we can remove the Btree data > >>>>>> structure and use simple single array and do binary search on it. > And > >>>>> also > >>>>>> we should move this cached arrays to unsafe (offheap/onheap) to > reduce > >>>>> the > >>>>>> burden on GC. > >>>>>> 2. Unify the btree to single Btree instead of 2 and load at driver > >>>> side. > >>>>>> So that only one lookup can be done to find the blocklets directly. > >> And > >>>>>> executors are not required to load the btree for every query. > >>>>>> We can consider moving this to separate metadata service > eventually > >>>>>> once the memory footprint get reduced. > >>>>>> > >>>>>> First I will consider point 1 reduce the btree size after that I > >>>> consider > >>>>>> merging of btrees. > >>>>>> > >>>>>> Please comment on it. > >>>>>> > >>>>>> -- > >>>>>> Thanks & Regards, > >>>>>> Ravindra. > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Regards > >>>>> Liang > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> Thanks & Regards, > >>>> Ravi > >>>> > >> > >> > >> > >> > > > > > > -- > > Thanks & Regards, > > Ravi > > > > > >
kumar vishal
|
Free forum by Nabble | Edit this page |