[jira] [Updated] (CARBONDATA-2638) Implement driver min max caching for specified columns and segregate block and blocklet cache

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (CARBONDATA-2638) Implement driver min max caching for specified columns and segregate block and blocklet cache

Akash R Nilugal (Jira)

     [ https://issues.apache.org/jira/browse/CARBONDATA-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Manish Gupta updated CARBONDATA-2638:
-------------------------------------
    Attachment:     (was: Driver_Block_Cache.docx)

> Implement driver min max caching for specified columns and segregate block and blocklet cache
> ---------------------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-2638
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2638
>             Project: CarbonData
>          Issue Type: New Feature
>            Reporter: Manish Gupta
>            Assignee: Manish Gupta
>            Priority: Major
>         Attachments: Driver_Block_Cache.docx
>
>
> *Background*
> Current implementation of Blocklet dataMap caching in driver is that it caches the min and max values of all the columns in schema by default. 
> *Problem*
>  Problem with this implementation is that as the number of loads increases the memory required to hold min and max values also increases considerably. We know that in most of the scenarios there is a single driver and memory configured for driver is less as compared to executor. With continuous increase in memory requirement driver can even go out of memory which makes the situation further worse.
> *Solution*
> 1. Cache only the required columns in Driver
> 2. Segregation of block and Blocklet level cache**
> For more details please check the attached document



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)