Posted by
bill.zhou on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSS-add-more-index-for-sort-columns-tp8891.html
hi all
Carbon will add min/max index for sort columns which used for better filter query. So can we add more index for the sort column to make filter faster.
This is one idea which I get from anther database design.
For example this is one student, and the column: score in the student table which will be sorted column. And the score range is from 1 to 100. The table as following:
| id | name | score |
|---|
| 1 | bill001 | 83 |
| 2 | bill002 | 84 |
| 3 | bill003 | 90 |
| 4 | bill004 | 89 |
| 5 | bill005 | 93 |
| 6 | bill006 | 76 |
| 7 | bill007 | 87 |
| 8 | bill008 | 90 |
| 9 | bill009 | 89 |
| 10 | bill010 | 96 |
| 11 | bill011 | 96 |
| 12 | bill012 | 100 |
| 13 | bill013 | 84 |
| 14 | bill014 | 90 |
| 15 | bill015 | 79 |
| 16 | bill016 | 1 |
| 17 | bill017 | 97 |
| 18 | bill018 | 79 |
| 19 | bill019 | 88 |
| 20 | bill068 | 95 |
After load the data into Cabron the score column will sort as following:
| 1 | 76 | 79 | 79 | 83 | 84 | 84 | 87 | 88 | 89 | 89 | 90 | 90 | 90 | 93 | 95 | 96 | 96 | 97 | 100 |
the min/max index is 1/100.
So for the query as following will take all the block data.
query1:select sum(score) from student when score score > 90
query2:select sum(score) from student when score score > 60 and score < 70.
Following two suggestion to reduce the block scan.
Suggestion 1: according the score range to divide into multiple small range for example 4:

0: meas this block has not the score rang value
1: meas this block has the score rang value
If add this index, for the query1 only need scan 1/4 data of the block and query2 no need scan any data, directly sckip this block
Suggestion 2: record more min/max for the score, for example every 5 rows record one min/max

If add this index for query1 only need scan 1/2 data of the block and query2 only need scan 1/4 data of the block
this is the raw idea, please Jacky, Ravindra and liang correct it whether we can add this feature. thanks
Regards
Bill