http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Carbon-Store-abstraction-tp24337p24438.html
Thank you for steering this activity. Yes, there is a need to refactor code
to get the store management out of spark integration module. It becomes
1. Is it really necessary to extract three modules, I think we can create
2. And also we better name current carbon-core module to carbon-scan or
3. Even table status creation and updating also should be belonged to
4. I think data loading map function is carbonoutputformat and it should
independent of each other and we can do it across versions also. And also
> The markup format in earlier mail is incorrect. Please refer to this one.
>
> carbondata-store is responsible to provide following interface:
> 1. Table management:
> - Initialize and persist table metadata when integration module create
> table. Currently, the metadata includes `TableInfo`. Table path should be
> specified by integration module
> - Delete metadata and data in table path when integration module drop
> table
> - Retrieve `TableInfo` from table path
> - Check whether table exists
> - Alter metadata in `TableInfo`
> 2. Segment management. (Segment is operated in transactional way)
> - Open new segment when integration module load new data
> - Commit segment when data operation is done successfully
> - Close segment when data operation failed
> - Delete segment when integration module drop segment
> - Retrieve segment information by giving segmentId
> 3. Compaction management
> - Compaction policy for deciding whether compaction should be carried
> out
> 4. Data operation (carbondata-store provides map functions in map-reduce
> manner)
> - Data loading map function
> - Delete segment map function
> - other operation that involves map side operation. (basically, it is
> the `internalCompute` function in all RDD in current spark integration
> module)
>
>
> > 在 2017年10月20日,下午4:56,Raghunandan S <
[hidden email]>
> 写道:
> >
> > I think we need to integrate with presto hive and then refactor.this
> gives
> > clear idea on what we want to achieve.each processing engine is different
> > in its own way and integrating first would give us a clear idea on what’s
> > required in CarbonData
> > On Fri, 20 Oct 2017 at 1:01 PM, Liang Chen <
[hidden email]>
> wrote:
> >
> >> Hi
> >>
> >> Thank you started this discussion. agree, for exposing the clear
> interface
> >> to users, there are some optimization works.
> >>
> >> Can you list the more detail about your proposal? for example: what
> class
> >> you propose to move to carbon store, what api you propose to create and
> >> expose to users.
> >> I suggest we can discuss and confirm your proposal in dev first, then
> >> start
> >> to create sub task in Jira.
> >>
> >> Regards
> >> Liang
> >>
> >>
> >> Jacky Li wrote
> >>> Hi community,
> >>>
> >>> I am proposing to create a carbondata-store module to abstract the
> carbon
> >>> store concept. The reason is:
> >>>
> >>> 1. Initially, carbon is designed as a file format, as it evolves to
> >>> provide more features, it implemented more and more functionalities in
> >> the
> >>> spark integration module. However, as community is trying to integrate
> >>> more and more compute framework with carbon, these functionalities is
> >>> duplicated across integration layer. Idealy, these functionality can be
> >>> unified and provided in one place.
> >>>
> >>> 2. The current interface of carbondata exposed to user is through SQL,
> >> but
> >>> the developer interface for developers who want to do compute engine
> >>> integration is not very clear.
> >>>
> >>> 3. There are many SQL command that carbon supported, but they are
> >>> implemented through spark RDD only. It is not sharable across compute
> >>> framework.
> >>>
> >>> Due to these reasons, for the long term future of carbondata, I think
> it
> >>> is better to abstract the interface for compute engine integration
> within
> >>> a new module called carbondata-store. It can wrap all store level
> >>> functionalities that above file format in an independent module of
> >> compute
> >>> engine, so that every integration module can depends on it and
> duplicate
> >>> code is removed.
> >>>
> >>> This is a continuous effort for long term, I will break this work into
> >>> subtask and start it by creating JIRA issue, if you agree.
> >>>
> >>> Regards,
> >>> Jacky Li
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Sent from:
> >>
http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
> >>
>
>