Apache CarbonData Dev Mailing List archive

Re: [Discussion] Carbon Store abstraction

Posted by Jacky Li on Oct 20, 2017; 9:12am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Carbon-Store-abstraction-tp24337p24430.html

The markup format in earlier mail is incorrect. Please refer to this one.

carbondata-store is responsible to provide following interface:
1. Table management:
- Initialize and persist table metadata when integration module create table. Currently, the metadata includes `TableInfo`. Table path should be specified by integration module
- Delete metadata and data in table path when integration module drop table
- Retrieve `TableInfo` from table path
- Check whether table exists
- Alter metadata in `TableInfo`
2. Segment management. (Segment is operated in transactional way)
- Open new segment when integration module load new data
- Commit segment when data operation is done successfully
- Close segment when data operation failed
- Delete segment when integration module drop segment
- Retrieve segment information by giving segmentId
3. Compaction management
- Compaction policy for deciding whether compaction should be carried out
4. Data operation (carbondata-store provides map functions in map-reduce manner)
- Data loading map function
- Delete segment map function
- other operation that involves map side operation. (basically, it is the `internalCompute` function in all RDD in current spark integration module)

> 在 2017年10月20日，下午4:56，Raghunandan S <[hidden email]> 写道：
>
> I think we need to integrate with presto hive and then refactor.this gives
> clear idea on what we want to achieve.each processing engine is different
> in its own way and integrating first would give us a clear idea on what’s
> required in CarbonData
> On Fri, 20 Oct 2017 at 1:01 PM, Liang Chen <[hidden email]> wrote:
>
>> Hi
>>
>> Thank you started this discussion. agree, for exposing the clear interface
>> to users, there are some optimization works.
>>
>> Can you list the more detail about your proposal? for example: what class
>> you propose to move to carbon store, what api you propose to create and
>> expose to users.
>> I suggest we can discuss and confirm your proposal in dev first, then
>> start
>> to create sub task in Jira.
>>
>> Regards
>> Liang
>>
>>
>> Jacky Li wrote
>>> Hi community,
>>>
>>> I am proposing to create a carbondata-store module to abstract the carbon
>>> store concept. The reason is:
>>>
>>> 1. Initially, carbon is designed as a file format, as it evolves to
>>> provide more features, it implemented more and more functionalities in
>> the
>>> spark integration module. However, as community is trying to integrate
>>> more and more compute framework with carbon, these functionalities is
>>> duplicated across integration layer. Idealy, these functionality can be
>>> unified and provided in one place.
>>>
>>> 2. The current interface of carbondata exposed to user is through SQL,
>> but
>>> the developer interface for developers who want to do compute engine
>>> integration is not very clear.
>>>
>>> 3. There are many SQL command that carbon supported, but they are
>>> implemented through spark RDD only. It is not sharable across compute
>>> framework.
>>>
>>> Due to these reasons, for the long term future of carbondata, I think it
>>> is better to abstract the interface for compute engine integration within
>>> a new module called carbondata-store. It can wrap all store level
>>> functionalities that above file format in an independent module of
>> compute
>>> engine, so that every integration module can depends on it and duplicate
>>> code is removed.
>>>
>>> This is a continuous effort for long term, I will break this work into
>>> subtask and start it by creating JIRA issue, if you agree.
>>>
>>> Regards,
>>> Jacky Li
>>
>>
>>
>>
>>
>> --
>> Sent from:
>> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>>