Apache CarbonData Dev Mailing List archive

Re: [DISCUSS] Distributed CarbonStore

Posted by Jacky Li on Aug 05, 2018; 12:37pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSS-Distributed-CarbonStore-tp58624p58961.html

+1

I think it is a new good feature to have, but the effort to develop is quite high. I am worried about the release cycle getting longer. Can you define a roadmap for this new feature, so it can be deliver in phases across future versions.

Do you have anything in mind for the roadmap?

Regards,
Jacky

> 在 2018年8月2日，上午11:43，Ajith shetty <[hidden email]> 写道：
>
> Hi all
>
> Currently the CarbonStore is very tightly coupled with FileSystem interface and which runs in process JVM like in spark. We can instead make CarbonStore run as a separate service which can be accessed via network/rpc. So as a Followup of CARBONDATA-2688 (CarbonStore Java API and REST API) we can make carbon store distributed
>
> This has some advantages.
>
> · Distributed CarbonStore can support parallel scanning i.e multiple tasks can start scanning data parallely, which may have a higher parallelism factor than compute layer
>
> · Distributed CarbonStore can support index service to multiple apps like (spark/ flink/ presto), such that index will be shared to save resource
>
> · Distributed CarbonStore resource consumption is isolated from application and easily scalable to support higher workloads
>
> · As a future improvement, Distributed CarbonStore can implement a query cache since it has independent resources
>
>
>
> Distributed CarbonStore will have 2 main deployment parts:
>
> 1. A cluster of remote carbon store service
>
> 2. SDK which acts as a client for communication with store
>
> Please provide your inputs/suggestions. If the idea sounds promising, i will go ahead and create JIRA/subJIRAs for the same
>
> Regards
> Ajith
>