Manish Gupta created CARBONDATA-2638:
----------------------------------------
Summary: Implement driver min max caching for specified columns and segregate block and blocklet cache
Key: CARBONDATA-2638
URL:
https://issues.apache.org/jira/browse/CARBONDATA-2638 Project: CarbonData
Issue Type: New Feature
Reporter: Manish Gupta
Assignee: Manish Gupta
Attachments: Driver_Block_Cache.docx
*Background*
Current implementation of Blocklet dataMap caching in driver is that it caches the min and max values of all the columns in schema by default.
*Problem*
Problem with this implementation is that as the number of loads increases the memory required to hold min and max values also increases considerably. We know that in most of the scenarios there is a single driver and memory configured for driver is less as compared to executor. With continuous increase in memory requirement driver can even go out of memory which makes the situation further worse.
*Solution*
1. Cache only the required columns in Driver
2. Segregation of block and Blocklet level cache**
For more details please check the attached document
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)