Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

Classic

List

Threaded

1 message

Akash R Nilugal (Jira)

[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

[ https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15542797#comment-15542797 ]

ASF GitHub Bot commented on CARBONDATA-284:
-------------------------------------------

GitHub user jackylk opened a pull request:

https://github.com/apache/incubator-carbondata/pull/208

[CARBONDATA-284][WIP] Abstracting index and segment interface

This PR adds new User API and Dev API for carbon-hadoop module:

### User API
- `CarbonColumnarInputFormat/OutputFormat`: it uses current `CarbonInputFormat` as internal implementation.
- `CarbonRowInputFormat/OutputFormat`: it needs to be implemented
- `CarbonOutputCommitter`: used for managing segment commit

They are based on `CarbonInputFormatBase/OutputFormatBase`

### Dev API
- Segment: an abstract class represents a single load of data, used by CarbonInputFormatBase to get all InputSplit by matching QueryModel, and used by CarbonOutputCommitter to prepare for reading. Implementation examples are `IndexedSegment` and `StreamingSegment`.
- SegmentManager: an interface to manage segments. Current implementation is `ZkSegmentManager`, which need to be mapped to existing logic.
- Index: an interface that can is used by `IndexedSegment` to filter InputSplit. Current implementation is `InMemoryBTreeIndex` which load the index into driver's memory.

`CarbonInputFormatUtil` is modified so that it can also be used by `CarbonColumnarInputFormat`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jackylk/incubator-carbondata index-interface

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/208.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #208

----
commit 398d2ec3e6706c615918a734a90f9dc4111067d8
Author: jackylk <[hidden email]>
Date: 2016-10-03T16:01:48Z

add User API

commit 1d92a00403faeebc09bf595ba11b3e55d4c997f2
Author: jackylk <[hidden email]>
Date: 2016-10-03T16:02:04Z

add Developer API

commit 1812a0a68b53ba5d48fc030e2a59329b0e827b05
Author: jackylk <[hidden email]>
Date: 2016-10-03T16:02:49Z

refactory existing code

commit 430e7710b88725b587c1f3542d4d66ab02958cbc
Author: jackylk <[hidden email]>
Date: 2016-10-03T16:27:10Z

change Index interface

----

> Abstracting Index and Segment interface
> ---------------------------------------
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
> Issue Type: Improvement
> Components: hadoop-integration
> Affects Versions: 0.1.0-incubating
> Reporter: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist:
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)