Hi all,
Let's discuss to support full-text search. A solution is embedding Lucene search library, index text columns for each segment and support searching on text columns. Listed some sub-tasks as following. 1). create Lucene DataMap with 'text_columns' property and build Lucene DataMap for all exists segments create datamap <datamapName> on <tableName> using 'lucene' dmproperties('text_columns'='col1,col2') 2). load data should build Lucene DataMap for the segment 3). query with Lucene DataMap while filters contain match UDF 4). compaction should rebuild Lucene DataMap for the new segment 5). update and delete data should sync Lucene DataMap 6). show DataMap for Lucene DataMap 7). delete segment should remove Lucene DataMap of this segment 8). drop table should remove Lucene DataMap of all segments 9). block streaming feature if the table has Lucene DataMap 10). Pre-aggregate DataMap feature not support match UDF Any suggestion, any question? ----- Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai |
Where will the code of this feature be? I think it will lie in a separate module. It would be better to treat datamaps as plugins and not strongly involved with carbondata core/processing module. > -----Original Messages----- > From: "David CaiQiang" <[hidden email]> > Sent Time: 2018-02-07 17:25:39 (Wednesday) > To: [hidden email] > Cc: > Subject: [Discussion] Implement Lucene DataMap to support full text search > > Hi all, > Let's discuss to support full-text search. > > A solution is embedding Lucene search library, index text columns for > each segment and support searching on text columns. > > Listed some sub-tasks as following. > > 1). create Lucene DataMap with 'text_columns' property and build Lucene > DataMap for all exists segments > > create datamap <datamapName> on <tableName> > using 'lucene' > dmproperties('text_columns'='col1,col2') > > 2). load data should build Lucene DataMap for the segment > > 3). query with Lucene DataMap while filters contain match UDF > > 4). compaction should rebuild Lucene DataMap for the new segment > > 5). update and delete data should sync Lucene DataMap > > 6). show DataMap for Lucene DataMap > > 7). delete segment should remove Lucene DataMap of this segment > > 8). drop table should remove Lucene DataMap of all segments > > 9). block streaming feature if the table has Lucene DataMap > > 10). Pre-aggregate DataMap feature not support match UDF > > Any suggestion, any question? > > > > ----- > Best Regards > David Cai > -- > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
It will be an independent module.
The layout maybe like this: carbondata |_ datamap |__ lucene ----- Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai |
In reply to this post by David CaiQiang
Hi,
Thanks for listing these sub-tasks. I think we can start with first three tasks, to simply enable write text index and query on text index. Other sub tasks (4 to 10) can be picked up by community developers. Regards, Jacky > 在 2018年2月7日,下午5:25,David CaiQiang <[hidden email]> 写道: > > Hi all, > Let's discuss to support full-text search. > > A solution is embedding Lucene search library, index text columns for > each segment and support searching on text columns. > > Listed some sub-tasks as following. > > 1). create Lucene DataMap with 'text_columns' property and build Lucene > DataMap for all exists segments > > create datamap <datamapName> on <tableName> > using 'lucene' > dmproperties('text_columns'='col1,col2') > > 2). load data should build Lucene DataMap for the segment > > 3). query with Lucene DataMap while filters contain match UDF > > 4). compaction should rebuild Lucene DataMap for the new segment > > 5). update and delete data should sync Lucene DataMap > > 6). show DataMap for Lucene DataMap > > 7). delete segment should remove Lucene DataMap of this segment > > 8). drop table should remove Lucene DataMap of all segments > > 9). block streaming feature if the table has Lucene DataMap > > 10). Pre-aggregate DataMap feature not support match UDF > > Any suggestion, any question? > > > ----- > Best Regards > David Cai > -- > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Free forum by Nabble | Edit this page |