[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

Akash R Nilugal (Jira)

    [ https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15542797#comment-15542797 ]

ASF GitHub Bot commented on CARBONDATA-284:
-------------------------------------------

GitHub user jackylk opened a pull request:

    https://github.com/apache/incubator-carbondata/pull/208

    [CARBONDATA-284][WIP] Abstracting index and segment interface

    This PR adds new User API and Dev API for carbon-hadoop module:
   
    ### User API
    - `CarbonColumnarInputFormat/OutputFormat`: it uses current `CarbonInputFormat` as internal implementation.
    - `CarbonRowInputFormat/OutputFormat`: it needs to be implemented
    - `CarbonOutputCommitter`: used for managing segment commit
   
    They are based on `CarbonInputFormatBase/OutputFormatBase`
   
    ### Dev API
    - Segment: an abstract class represents a single load of data,  used by CarbonInputFormatBase to get all InputSplit by matching QueryModel, and used by CarbonOutputCommitter to prepare for reading. Implementation examples are `IndexedSegment` and `StreamingSegment`.
    - SegmentManager: an interface to manage segments. Current implementation is `ZkSegmentManager`, which need to be mapped to existing logic.
    - Index: an interface that can is used by `IndexedSegment` to filter InputSplit. Current implementation is `InMemoryBTreeIndex` which load the index into driver's memory.
   
    `CarbonInputFormatUtil` is modified so that it can also be used by `CarbonColumnarInputFormat`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jackylk/incubator-carbondata index-interface

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-carbondata/pull/208.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #208
   
----
commit 398d2ec3e6706c615918a734a90f9dc4111067d8
Author: jackylk <[hidden email]>
Date:   2016-10-03T16:01:48Z

    add User API

commit 1d92a00403faeebc09bf595ba11b3e55d4c997f2
Author: jackylk <[hidden email]>
Date:   2016-10-03T16:02:04Z

    add Developer API

commit 1812a0a68b53ba5d48fc030e2a59329b0e827b05
Author: jackylk <[hidden email]>
Date:   2016-10-03T16:02:49Z

    refactory existing code

commit 430e7710b88725b587c1f3542d4d66ab02958cbc
Author: jackylk <[hidden email]>
Date:   2016-10-03T16:27:10Z

    change Index interface

----


> Abstracting Index and Segment interface
> ---------------------------------------
>
>                 Key: CARBONDATA-284
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-284
>             Project: CarbonData
>          Issue Type: Improvement
>          Components: hadoop-integration
>    Affects Versions: 0.1.0-incubating
>            Reporter: Jacky Li
>             Fix For: 0.2.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist:
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)