[ https://issues.apache.org/jira/browse/CARBONDATA-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiangmanhua updated CARBONDATA-2747: ------------------------------------ Description: similar problem in bloom datamap is in issue CARBONDATA-2746; Analysis: In `DataMapChooser#extractColumnExpression`, it does not deal with `MatchExpression`. This makes no information to use the column name to filter datamap. In `DataMapChooser#contains`, all datamap are marked as useful if lucene datamap is hit ( `ExpressionType.TEXT_MATCH`). Then the first datamap is chosen after sort step(sort by number of index column) . In `LuceneDataMapFactoryBase#toDistributable`, carbon getAllIndexDirs and build DataMapDistributable for each index in same segment. This means that one segment will be applied `prune` by different index datamap(lucene use `indexPath` in `LuceneDataMapDistributable` to init its datamap object and build the `indexSearcherMap`) In out test case, we build datamaps on columns:name and city, one for each. Query uses column `name` as filter. Unfortunately, in the `DataMapChooser`, it chooses datamap of city column. Then in `toDistributable` method, it gets all datamaps and build `LuceneDataMapDistributable`. Here in out test, it will prune and get result from each datamap. On datamap of city, query "name:c10" in lucene return no row. On datamap of name, query "name:c10" in lucene return actual what we want. So, if we apply same fix in CARBONDATA-2746 for lucene, we will get only one datamap ( which is for city column) and prune result will be nothing. To Fix: # choose correct datamap in DataMapChooser for lucene # apply same fix in CARBONDATA-2746 to build correct `LuceneDataMapDistributable` was: similar problem in bloom datamap is in issue CARBONDATA-2746; Analysis: In `DataMapChooser#extractColumnExpression`, it does not deal with `MatchExpression`. This makes no information to use the column name to filter datamap. In `DataMapChooser#contains`, all datamap are marked as useful if lucene datamap is hit ( `ExpressionType.TEXT_MATCH`). Then the first datamap is chosen after sort step(sort by number of index column) . In `LuceneDataMapFactoryBase#toDistributable`, carbon getAllIndexDirs and build DataMapDistributable for each index in same segment. This means that one segment will be applied `prune` by different index datamap(lucene use `indexPath` in `LuceneDataMapDistributable` to init its datamap object and build the `indexSearcherMap`) In out test case, we build datamaps on columns:name and city, one for each. Query uses column `name` as filter. Unfortunately, in the `DataMapChooser`, it chooses datamap of city So, > Lucene build wrong DataMapDistributable for all datamaps with same DataMapSchema > -------------------------------------------------------------------------------- > > Key: CARBONDATA-2747 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2747 > Project: CarbonData > Issue Type: Bug > Reporter: jiangmanhua > Priority: Major > > similar problem in bloom datamap is in issue CARBONDATA-2746; > > Analysis: > In `DataMapChooser#extractColumnExpression`, it does not deal with `MatchExpression`. This makes no information to use the column name to filter datamap. > > In `DataMapChooser#contains`, all datamap are marked as useful if lucene datamap is hit ( `ExpressionType.TEXT_MATCH`). Then the first datamap is chosen after sort step(sort by number of index column) . > > In `LuceneDataMapFactoryBase#toDistributable`, carbon getAllIndexDirs and build DataMapDistributable for each index in same segment. This means that one segment will be applied `prune` by different index datamap(lucene use `indexPath` in `LuceneDataMapDistributable` to init its datamap object and build the `indexSearcherMap`) > > In out test case, we build datamaps on columns:name and city, one for each. > Query uses column `name` as filter. Unfortunately, in the `DataMapChooser`, it chooses datamap of city column. > Then in `toDistributable` method, it gets all datamaps and build `LuceneDataMapDistributable`. Here in out test, it will prune and get result from each datamap. > On datamap of city, query "name:c10" in lucene return no row. On datamap of name, query "name:c10" in lucene return actual what we want. > > So, if we apply same fix in CARBONDATA-2746 for lucene, we will get only one datamap ( which is for city column) and prune result will be nothing. > > To Fix: > # choose correct datamap in DataMapChooser for lucene > # apply same fix in CARBONDATA-2746 to build correct `LuceneDataMapDistributable` -- This message was sent by Atlassian JIRA (v7.6.3#76005) |
Free forum by Nabble | Edit this page |