Apache CarbonData Dev Mailing List archive

Re: [Discussion] DDLs to operate on CarbonLRUCache

Posted by sujith chacko on Feb 19, 2019; 10:10am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-DDLs-to-operate-on-CarbonLRUCache-tp75197p75236.html

Hi Naman,

Thanks for proposing this feature, seems to be pretty interesting feature,
few points i want to bring up here

1) I think we shall require a detailed design for this feature where all
the DDL's you are going to expose shall be clearly mentioned as frequent
updation of DDL's are not recommended in future.
Better you can also cover the scenarios which can impact your DDL
operation like cross session operation of DDL's

eg: one user is trying to clear the cache/table and another user will
execute show cache command. basically you can also mention how you will
handle all the synchronization scenarios.

2) Already Spark has exposed DDL's for clearing the caches as below, please
refer the same and try to get more insights about this DDL.Better to follow
a standard syntax.
"CLEAR CACHE"

"UNCACHE TABLE (IF EXISTS)? tableIdentifier"

3) How you will deal with drop table case, i think you shall clear the
respective cache also. mention these scenarios clearly in your design
document.

4) 0 for point 5, as i think you need to explain more on your design
document about the scenarios and the need of this feature, this ddl can
bring up more complexities to the system
eg: By the time system calculate the table size a new segment can get
added or an existing segment can get modified. so basically again you need
to go for a lock so that these kind
of synchronization issues can be tackle in better manner.

Overall i think the approach shall be well documented before you can start
with implementation. Please let me know for any clarifications or
suggestions regarding above points.

Regards,
Sujith

On Mon, Feb 18, 2019 at 3:35 PM Naman Rastogi <[hidden email]>
wrote:

> Hi all,
>
> Currently carbon supports caching mechanism for Blocks/Blocklets. Even
> though it allows end user to set the Cache size, it is still very
> limited in functionality, and user arbitrarily chooses the carbon
> property *carbon.max.driver.lru.cache.size* where before launching the
> carbon session, he/she has no idea of how much cache should be set for
> his/her requirement.
>
> For this problem, I propose the following imporvements in carbon caching
> mechanism.
>
> 1. Support DDL for showing current cache used per table.
> 2. Support DDL for showing current cache used for a particular table.
> For these two points, QiangCai has already has a PR:
> https://github.com/apache/carbondata/pull/3078
>
> 3. Support DDL for clearing all the entries cache.
> This will look like:
> CLEAN CACHE
>
> 4. Support DDL for clearing cache for a particular table.
> This will clear all the entries in the cache which belong to a
> particular table. This will look like
> CLEAN CACHE FOR TABLE tablename
>
> 5. Support DDL to estimate required cache for a particular table.
> As explained above, the user does not know beforehand how much cache
> will be required for his/her current work. So this DDL will let the
> user estimate how much cache will be required for a particular
> table. For this we will launch a job and estimate the memory
> required for all the blocks, and sum it up.
>
> 6. Dynamic "max cache size" configration
> Suppose now the user knows required cache size he needs, but the
> current system requires the user to set the
> *carbon.max.driver.lru.cache.size* and restart the JDBC server for
> it to take effect. For this I am suggesting to make the carbon
> property *carbon.max.driver.lru.cache.size* dynamically configurable
> which allows the user to change the max LRU cache size on the fly.
>
> Any suggestion from the community is greatly appreciated.
>
> Thanks
>
> Regards
>
> Naman Rastogi
> Technical Lead - BigData Kernel
> Huawei Technologies India Pvt. Ltd.
>