[jira] [Updated] (CARBONDATA-2747) Lucene build wrong DataMapDistributable for all datamaps with same DataMapSchema

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (CARBONDATA-2747) Lucene build wrong DataMapDistributable for all datamaps with same DataMapSchema

Akash R Nilugal (Jira)

     [ https://issues.apache.org/jira/browse/CARBONDATA-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

jiangmanhua updated CARBONDATA-2747:
------------------------------------
    Description:
similar problem in bloom datamap is in issue CARBONDATA-2746; but test result is wrong  if we apply same fix

 

Analysis:

In `DataMapChooser#extractColumnExpression`, it does not deal with `MatchExpression`. This makes no information to use the column name to filter datamap.

 

In `DataMapChooser#contains`, all datamap are marked as useful if lucene datamap is hit ( `ExpressionType.TEXT_MATCH`). Then the first datamap is chosen after sort step(sort by number of index column) . 

 

In `LuceneDataMapFactoryBase#toDistributable`, carbon getAllIndexDirs and build DataMapDistributable for each index in same segment. This means that one segment will be applied `prune` by different index datamap(lucene use `indexPath` in `LuceneDataMapDistributable` to init its datamap object and build the `indexSearcherMap`)

 

In out test case, we build datamaps  on columns:name and city, one for each.

Query uses column `name` as filter. Unfortunately, in the `DataMapChooser`, it chooses datamap of city column.

Then in `toDistributable` method, it gets all datamaps and build `LuceneDataMapDistributable`. Here in out test, it will prune and get result from each datamap.

On datamap of city, query "name:c10"  in lucene return no row. On datamap of name, query "name:c10"  in lucene return actual what we want.

 

So, if we apply same fix in CARBONDATA-2746 for lucene,  we will get only one datamap ( which is for city column) and prune result will be nothing.

 

To Fix:
 # choose correct datamap in DataMapChooser for lucene
 # apply same fix in CARBONDATA-2746 to build correct `LuceneDataMapDistributable`

  was:
similar problem in bloom datamap is in issue CARBONDATA-2746;

 

Analysis:

In `DataMapChooser#extractColumnExpression`, it does not deal with `MatchExpression`. This makes no information to use the column name to filter datamap.

 

In `DataMapChooser#contains`, all datamap are marked as useful if lucene datamap is hit ( `ExpressionType.TEXT_MATCH`). Then the first datamap is chosen after sort step(sort by number of index column) . 

 

In `LuceneDataMapFactoryBase#toDistributable`, carbon getAllIndexDirs and build DataMapDistributable for each index in same segment. This means that one segment will be applied `prune` by different index datamap(lucene use `indexPath` in `LuceneDataMapDistributable` to init its datamap object and build the `indexSearcherMap`)

 

In out test case, we build datamaps  on columns:name and city, one for each.

Query uses column `name` as filter. Unfortunately, in the `DataMapChooser`, it chooses datamap of city column.

Then in `toDistributable` method, it gets all datamaps and build `LuceneDataMapDistributable`. Here in out test, it will prune and get result from each datamap.

On datamap of city, query "name:c10"  in lucene return no row. On datamap of name, query "name:c10"  in lucene return actual what we want.

 

So, if we apply same fix in CARBONDATA-2746 for lucene,  we will get only one datamap ( which is for city column) and prune result will be nothing.

 

To Fix:
 # choose correct datamap in DataMapChooser for lucene
 # apply same fix in CARBONDATA-2746 to build correct `LuceneDataMapDistributable`


> Lucene build wrong DataMapDistributable for all datamaps with same DataMapSchema
> --------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-2747
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2747
>             Project: CarbonData
>          Issue Type: Bug
>            Reporter: jiangmanhua
>            Priority: Major
>
> similar problem in bloom datamap is in issue CARBONDATA-2746; but test result is wrong  if we apply same fix
>  
> Analysis:
> In `DataMapChooser#extractColumnExpression`, it does not deal with `MatchExpression`. This makes no information to use the column name to filter datamap.
>  
> In `DataMapChooser#contains`, all datamap are marked as useful if lucene datamap is hit ( `ExpressionType.TEXT_MATCH`). Then the first datamap is chosen after sort step(sort by number of index column) . 
>  
> In `LuceneDataMapFactoryBase#toDistributable`, carbon getAllIndexDirs and build DataMapDistributable for each index in same segment. This means that one segment will be applied `prune` by different index datamap(lucene use `indexPath` in `LuceneDataMapDistributable` to init its datamap object and build the `indexSearcherMap`)
>  
> In out test case, we build datamaps  on columns:name and city, one for each.
> Query uses column `name` as filter. Unfortunately, in the `DataMapChooser`, it chooses datamap of city column.
> Then in `toDistributable` method, it gets all datamaps and build `LuceneDataMapDistributable`. Here in out test, it will prune and get result from each datamap.
> On datamap of city, query "name:c10"  in lucene return no row. On datamap of name, query "name:c10"  in lucene return actual what we want.
>  
> So, if we apply same fix in CARBONDATA-2746 for lucene,  we will get only one datamap ( which is for city column) and prune result will be nothing.
>  
> To Fix:
>  # choose correct datamap in DataMapChooser for lucene
>  # apply same fix in CARBONDATA-2746 to build correct `LuceneDataMapDistributable`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)