Re: Support SI at Segment level
Posted by
David CaiQiang on
Feb 19, 2021; 2:45am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Support-SI-at-Segment-level-tp106256p106338.html
hi Nihal,
My thoughts as follows.
1. segment level's differences with table level
a) pushdown SI into CarbonDataSourceScan/Relation and avoid rewriting the
SQL plan
b) different segments will have different SI, so different segments maybe
choose the different SI
2. data loading/compaction/update/delete/merge
a) the main table can update tablestatus metadata entry to success status
before SI loading
b) if SI is disabled, no need to do SI loading; if SI is enabled, it can
do SI loading.
3. query
a) reading the data of SI table could be on the executor side; reading the
index of SI table could be on the driver side.
b) performance: now the system uses a distributed job (groupBy and Join
query) to collect the positionIDs of the result rows; if TableIndex.prune
use a single thread will have performance issue.
c) when the table has multiple SI tables, positionId join of table level
shoulde be converted to segment level join.
-----
Best Regards
David Cai
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai