Login  Register

Re: Support SI at Segment level

Posted by David CaiQiang on Feb 19, 2021; 2:45am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Support-SI-at-Segment-level-tp106256p106338.html

hi Nihal,
My thoughts as follows.
1. segment level's differences with table level
  a) pushdown SI into CarbonDataSourceScan/Relation and avoid rewriting the
SQL plan
  b) different segments will have different SI, so different segments maybe
choose the different SI
 

2. data loading/compaction/update/delete/merge
  a) the main table can update tablestatus metadata entry to success status
before SI loading
  b) if SI is disabled, no need to do SI loading; if SI is enabled, it can
do SI loading.

3. query
  a) reading the data of SI table could be on the executor side; reading the
index of SI table could be on the driver side.
  b) performance: now the system uses a distributed job (groupBy and Join
query) to collect the positionIDs of the result rows; if  TableIndex.prune
use a single thread will have performance issue.
  c) when the table has multiple SI tables, positionId join of table level
shoulde be converted to segment level join.



-----
Best Regards
David Cai
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai