[Discussion] Implement Lucene DataMap to support full text search

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[Discussion] Implement Lucene DataMap to support full text search

David CaiQiang
Hi all,
    Let's discuss to support full-text search.  

    A solution is embedding Lucene search library, index text columns for
each segment and support searching on text columns.

    Listed some sub-tasks as following.

    1). create Lucene DataMap with 'text_columns' property and build Lucene
DataMap for all exists segments

       create datamap <datamapName> on <tableName>
       using 'lucene'
       dmproperties('text_columns'='col1,col2')

    2). load data should build Lucene DataMap for the segment

    3). query with Lucene DataMap while filters contain match UDF

    4). compaction should rebuild Lucene DataMap for the new segment

    5). update and delete data should sync Lucene DataMap

    6). show DataMap for Lucene DataMap

    7). delete segment should remove Lucene DataMap of this segment

    8). drop table should remove Lucene DataMap of all segments

    9). block streaming feature if the table has Lucene DataMap

    10). Pre-aggregate DataMap feature not support match UDF

    Any suggestion, any question?



-----
Best Regards
David Cai
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Implement Lucene DataMap to support full text search

xuchuanyin

Where will the code of this feature be?
I think it will lie in a separate module. It would be better to treat datamaps as plugins and not strongly involved with carbondata core/processing module.

> -----Original Messages-----
> From: "David CaiQiang" <[hidden email]>
> Sent Time: 2018-02-07 17:25:39 (Wednesday)
> To: [hidden email]
> Cc:
> Subject: [Discussion] Implement Lucene DataMap to support full text search
>
> Hi all,
>     Let's discuss to support full-text search.  
>
>     A solution is embedding Lucene search library, index text columns for
> each segment and support searching on text columns.
>
>     Listed some sub-tasks as following.
>
>     1). create Lucene DataMap with 'text_columns' property and build Lucene
> DataMap for all exists segments
>
>        create datamap <datamapName> on <tableName>
>        using 'lucene'
>        dmproperties('text_columns'='col1,col2')
>
>     2). load data should build Lucene DataMap for the segment
>
>     3). query with Lucene DataMap while filters contain match UDF
>
>     4). compaction should rebuild Lucene DataMap for the new segment
>
>     5). update and delete data should sync Lucene DataMap
>
>     6). show DataMap for Lucene DataMap
>
>     7). delete segment should remove Lucene DataMap of this segment
>
>     8). drop table should remove Lucene DataMap of all segments
>
>     9). block streaming feature if the table has Lucene DataMap
>
>     10). Pre-aggregate DataMap feature not support match UDF
>
>     Any suggestion, any question?
>
>
>
> -----
> Best Regards
> David Cai
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Implement Lucene DataMap to support full text search

David CaiQiang
It will be an independent module.
The layout maybe like this:

carbondata
   |_ datamap
           |__ lucene



-----
Best Regards
David Cai
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Implement Lucene DataMap to support full text search

Jacky Li
In reply to this post by David CaiQiang
Hi,

Thanks for listing these sub-tasks.
I think we can start with first three tasks, to simply enable write text index and query on text index. Other sub tasks (4 to 10) can be picked up by community developers.

Regards,
Jacky

> 在 2018年2月7日,下午5:25,David CaiQiang <[hidden email]> 写道:
>
> Hi all,
>    Let's discuss to support full-text search.  
>
>    A solution is embedding Lucene search library, index text columns for
> each segment and support searching on text columns.
>
>    Listed some sub-tasks as following.
>
>    1). create Lucene DataMap with 'text_columns' property and build Lucene
> DataMap for all exists segments
>
>       create datamap <datamapName> on <tableName>
>       using 'lucene'
>       dmproperties('text_columns'='col1,col2')
>
>    2). load data should build Lucene DataMap for the segment
>
>    3). query with Lucene DataMap while filters contain match UDF
>
>    4). compaction should rebuild Lucene DataMap for the new segment
>
>    5). update and delete data should sync Lucene DataMap
>
>    6). show DataMap for Lucene DataMap
>
>    7). delete segment should remove Lucene DataMap of this segment
>
>    8). drop table should remove Lucene DataMap of all segments
>
>    9). block streaming feature if the table has Lucene DataMap
>
>    10). Pre-aggregate DataMap feature not support match UDF
>
>    Any suggestion, any question?
>
>
> -----
> Best Regards
> David Cai
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/