Apache CarbonData Dev Mailing List archive - Re: [Discussion] Minimize the Btree size and unify the driver and executor Btrees.

Apache CarbonData Dev Mailing List archive

Re: [Discussion] Minimize the Btree size and unify the driver and executor Btrees.

Posted by Jacky Li on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Minimize-the-Btree-size-and-unify-the-driver-and-executor-Btrees-tp12771p12935.html

> 在 2017年5月18日，下午3:31，Kumar Vishal <[hidden email]> 写道：
>
> Hi Ravi,
>
> I have few queries related to both the solution.
>
> 1. To reduce the java heap for Btree , we can remove the Btree data
> structure and use simple single array and do binary search on it. And also
> we should move this cached arrays to unsafe (offheap/onheap) to reduce the
> burden on GC.
>

I think now is the right time to abstract driver side index into an interface, which can have multiple implementation: onheap, offheap, external service, etc.

> Single array of object?? If user is loading small data in each load so we
> will save only intermediate node of btree by maintaining in array. Yes by
> maintaining all the metadata(startkey, min,max) in offheap will reduce
> memory footprint. Is it possible to maintain data in single byte array and
> maintaining the offset of each key for each column??
>

Do you mean put to all indexes for all segments in a single array? In that case, we can compare in-memory sequential scan and binary search which introduce random walk on memory. I feel sequential scan may beat random walk if the number of element is not many.

> 2. Unify the btree to single Btree instead of 2 and load at driver side.
> So that only one lookup can be done to find the blocklets directly. And
> executors are not required to load the btree for every query.
>
> If we unify the btree to single btree, then it will be loaded in driver
> side, so driver will need more memory to maintain this metadata and for
> each query driver needs to do blocklet level pruning, in case of concurrent
> query driver may be overloaded. And how we will send this metadata
> information (offset of each column in file and other metadata) to executor,
> serializing and deserializing this metadata cost also we need to consider.
>

Assuming there is only 4~8 blocklet in block, I think for a block, we can sent blockletId using a byte array. Should be ok, right?

> Please correct me if my comments are not valid.
>
> -Regards
> Kumar Vishal
>
>
> On Thu, May 18, 2017 at 12:33 PM, Ravindra Pesala <[hidden email]>
> wrote:
>
>> Hi Liang,
>>
>> Yes Liang , it will be done in 2 parts. At first reduce the size of the
>> btree and then merge the driver side and executor btree to single btree.
>>
>> Regards,
>> Ravindra.
>>
>> On 17 May 2017 at 19:28, Liang Chen <[hidden email]> wrote:
>>
>>> Hi Ravi
>>>
>>> Thank you bringing this improvement discussion to mailing list.
>>>
>>> One question , the point1 how to solve the below issues ? there are still
>>> two part index info in driver and executor side ?
>>> ------------------------------------------------------------
>>> ----------------------------------------
>>> And also chances of loading btree on each executor is more for every
>> query
>>> because there is no guarantee that same block goes to same executor every
>>> time. It will be worse in case of dynamic containers.
>>>
>>> Regards
>>> Liang
>>>
>>> 2017-05-17 7:33 GMT-04:00 Ravindra Pesala <[hidden email]>:
>>>
>>>> Hi,
>>>>
>>>> *1. Current problem.*
>>>> 1.There is more size taking on java heap to create Btree for index
>> file.
>>>> It is because we create multiple objects for each leaf node so it takes
>>>> more memory inside heap than actual size of index file. while doing LRU
>>>> cache also we are considering only index file size instead of objects
>>> size
>>>> so it impacts the eviction process of LRU cache.
>>>> 2. Currently we load one btree on driver side to find the blocks and
>>> load
>>>> another btree on executor side to find the blocklets. After we have
>>>> increased the blocklet size to 128 mb and decrease the table_block size
>>> to
>>>> 256 mb the number of nodes inside driver side btree and executor side
>>> btree
>>>> is not much different. So it would be overhead to read the same
>>> information
>>>> twice.
>>>> And also chances of loading btree on each executor is more for every
>>> query
>>>> because there is no guarantee that same block goes to same executor
>> every
>>>> time. It will be worse in case of dynamic containers.
>>>>
>>>> *2. Proposed solution.*
>>>> 1. To reduce the java heap for Btree , we can remove the Btree data
>>>> structure and use simple single array and do binary search on it. And
>>> also
>>>> we should move this cached arrays to unsafe (offheap/onheap) to reduce
>>> the
>>>> burden on GC.
>>>> 2. Unify the btree to single Btree instead of 2 and load at driver
>> side.
>>>> So that only one lookup can be done to find the blocklets directly. And
>>>> executors are not required to load the btree for every query.
>>>> We can consider moving this to separate metadata service eventually
>>>> once the memory footprint get reduced.
>>>>
>>>> First I will consider point 1 reduce the btree size after that I
>> consider
>>>> merging of btrees.
>>>>
>>>> Please comment on it.
>>>>
>>>> --
>>>> Thanks & Regards,
>>>> Ravindra.
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>> Liang
>>>
>>
>>
>>
>> --
>> Thanks & Regards,
>> Ravi
>>