Login  Register

【DISCUSS】add more index for sort columns

Posted by bill.zhou on Mar 14, 2017; 3:10pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSS-add-more-index-for-sort-columns-tp8891.html

hi all

  Carbon will add min/max index for sort columns which used for better filter query. So can we add more index for the sort column to make filter faster.

  This is one idea which I get from anther database design.
   For example this is one student, and the column: score in the student table which will be sorted column. And the score range is from 1 to 100.  The table as following:
idnamescore
1 bill00183
2 bill00284
3 bill00390
4 bill00489
5 bill00593
6 bill00676
7 bill00787
8 bill00890
9 bill00989
10bill01096
11bill01196
12bill012100
13bill01384
14bill01490
15bill01579
16bill0161
17bill01797
18bill01879
19bill01988
20bill06895
   After load the data into Cabron the score column will sort as following:
1767979838484878889899090909395969697100
the min/max index is 1/100.
So for the query as following will take all the block data.
query1:select sum(score) from student when score score > 90
query2:select sum(score) from student when score score > 60 and score < 70.

Following two suggestion to reduce the block scan.
Suggestion 1: according the score range to divide into multiple small range for example 4:

0: meas this block has not the score rang value
1: meas this block has the score rang value
If add this index, for the query1 only need scan 1/4 data of the block and query2 no need scan any data, directly sckip this block

Suggestion 2: record more min/max for the score, for example every 5 rows record one min/max

If add this index for query1 only need scan 1/2 data of the block and query2 only need scan 1/4 data of the block

this is the raw idea, please Jacky, Ravindra and liang correct it whether we can add this feature. thanks

Regards
Bill