Apache CarbonData Dev Mailing List archive

Re: Abstracting CarbonData's Index Interface

Posted by Qingqing Zhou on Oct 04, 2016; 5:04pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Abstracting-CarbonData-s-Index-Interface-tp1587p1618.html

On Mon, Oct 3, 2016 at 8:52 PM, Jacky Li <[hidden email]> wrote:
> I think we can try to reuse anything except for Index storage, like
> segment management, query logic processing after InputSplit is gathered
> by calling index interface.I think index can be programmed in different
> level, what I proposed here is still a block level solution, so it can
> be processed in InputFormat level.

I agree the scan provider is at InputFormat level, this is the same for
wherever you store the index. What the discrepancy I see here is the
implementation of the index itself: if you use "database service" to store
your index, you can simple invoke an "CRAETE INDEX" statement to implment
indexing, but if you want to store index in Carbon, you will need to
implment the btree yourself.

Agree "segment management" can be shared, as it is for the indexed data.
About "query logic processing": Currently Carbon pushes certain SARGs into
storage level: so in above picture, these two implementations won't be
able to share this logic: the "data service" one will rely on the query
processor (currently Spark) to tell how to use index, while "carbon" one
will handle it internally. To change this, we will have to expose "carbon"
index to query processor level.

Regards,
Qingqing