[jira] [Created] (CARBONDATA-3770) improve partition count star query performance by reading from valid index files directly

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (CARBONDATA-3770) improve partition count star query performance by reading from valid index files directly

Akash R Nilugal (Jira)
ZHANGSHUNYU created CARBONDATA-3770:
---------------------------------------

             Summary: improve partition count star query performance by reading from valid index files directly
                 Key: CARBONDATA-3770
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3770
             Project: CarbonData
          Issue Type: Improvement
            Reporter: ZHANGSHUNYU


Problem:
 # currently the count(*) with filter whose culumns are all partition columns will load datamaps of these partitions including block info/minmax info, but it is no need to load them ,we can just read it from valid index files directly as the rowCount stored inside, and cache these info.

 # For no-sort partition table, minmax is almost no using but cost time.

Solutions:

The detail of query flow as following if it is pure partition count star
Step 1. check whether it is pure partition count star by filter
Step 2. read tablestatus to get all valid segments, remove the segment file cache of invalid segment and expired segment
Step 3. use multi-thread to read segment files which not in cache and cache index files list of each segment into memory. If its index files already is eixst in cache, not require to read again.
Step 4. use multi-thread to prune segment and partition to get pruned index file list, which can prune most index files and reduce the files num.
Step 5. read the count from pruned index file directly and cache it, get from cache if exist in the index_file <-> rowCount map.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)