Apache CarbonData Dev Mailing List archive

Re: Abstracting CarbonData's Index Interface

Posted by Qingqing Zhou on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Abstracting-CarbonData-s-Index-Interface-tp1587p1609.html

On Fri, Sep 30, 2016 at 10:31 PM, Jacky Li <[hidden email]> wrote:
> However, it also introduces memory consumption of the index tree and
> impact first query time because the process of loading of index from
> file footer into memory. On the other side, in a multi-tennant
> environment, multiple applications may access data files simultaneously,
> which again exacerbate this resource consumption issue.
>
Agree we shall at least not rely so much on driver memory for indexing.

>
> Goal 1: User can choose the place to store Index data, it can be stored
> in processing framework's memory space (like in spark driver memory) or
> in another service outside of the processing framework (like using a
> independent database service)
>

How much will be the same index on different "places" code shared? For
example, for Btree index, if you do it inside Carbon, you are programming
at block level and you will worry about block [de]allocation, tree balance
etc. But if you rely on a database service, you programming at table
level, which you are programming with relational table/index. Meanwhile,
index is essentially a data redundancy, so updates needs careful design if
the index is outside of your control.

Regards,
Qingqing