[ https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15542797#comment-15542797 ] ASF GitHub Bot commented on CARBONDATA-284: ------------------------------------------- GitHub user jackylk opened a pull request: https://github.com/apache/incubator-carbondata/pull/208 [CARBONDATA-284][WIP] Abstracting index and segment interface This PR adds new User API and Dev API for carbon-hadoop module: ### User API - `CarbonColumnarInputFormat/OutputFormat`: it uses current `CarbonInputFormat` as internal implementation. - `CarbonRowInputFormat/OutputFormat`: it needs to be implemented - `CarbonOutputCommitter`: used for managing segment commit They are based on `CarbonInputFormatBase/OutputFormatBase` ### Dev API - Segment: an abstract class represents a single load of data, used by CarbonInputFormatBase to get all InputSplit by matching QueryModel, and used by CarbonOutputCommitter to prepare for reading. Implementation examples are `IndexedSegment` and `StreamingSegment`. - SegmentManager: an interface to manage segments. Current implementation is `ZkSegmentManager`, which need to be mapped to existing logic. - Index: an interface that can is used by `IndexedSegment` to filter InputSplit. Current implementation is `InMemoryBTreeIndex` which load the index into driver's memory. `CarbonInputFormatUtil` is modified so that it can also be used by `CarbonColumnarInputFormat`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jackylk/incubator-carbondata index-interface Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/208.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #208 ---- commit 398d2ec3e6706c615918a734a90f9dc4111067d8 Author: jackylk <[hidden email]> Date: 2016-10-03T16:01:48Z add User API commit 1d92a00403faeebc09bf595ba11b3e55d4c997f2 Author: jackylk <[hidden email]> Date: 2016-10-03T16:02:04Z add Developer API commit 1812a0a68b53ba5d48fc030e2a59329b0e827b05 Author: jackylk <[hidden email]> Date: 2016-10-03T16:02:49Z refactory existing code commit 430e7710b88725b587c1f3542d4d66ab02958cbc Author: jackylk <[hidden email]> Date: 2016-10-03T16:27:10Z change Index interface ---- > Abstracting Index and Segment interface > --------------------------------------- > > Key: CARBONDATA-284 > URL: https://issues.apache.org/jira/browse/CARBONDATA-284 > Project: CarbonData > Issue Type: Improvement > Components: hadoop-integration > Affects Versions: 0.1.0-incubating > Reporter: Jacky Li > Fix For: 0.2.0-incubating > > > This issue is intended to abstract developer API and user API to achieve following goals: > Goal 1: User can choose the place to store Index data, it can be stored in > processing framework's memory space (like in spark driver memory) or in > another service outside of the processing framework (like using a > independent database service, which can be shared across client) > Goal 2: Developer can add more index of his choice to CarbonData files. > Besides B+ tree on multi-dimensional key which current CarbonData supports, > developers are free to add other indexing technology to make certain > workload faster. These new indices should be added in a pluggable way. > This Jira has been discussed in maillist: > http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) |
Free forum by Nabble | Edit this page |