[jira] [Created] (CARBONDATA-2941) Support decision based min max writing for a column

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (CARBONDATA-2941) Support decision based min max writing for a column

Akash R Nilugal (Jira)
Manish Gupta created CARBONDATA-2941:
----------------------------------------

             Summary: Support decision based min max writing for a column
                 Key: CARBONDATA-2941
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2941
             Project: CarbonData
          Issue Type: Improvement
            Reporter: Manish Gupta
            Assignee: Manish Gupta


*Background* 
Currently we are storing min max for all the columns. Currently we are 
storing page min max, blocklet min max in filefooter and all the blocklet 
metadata entries in the shard. Consider the case where each column data 
size is more than 10000 characters. In this case if we write min max then 
min max will be written 3 times for each column and it will lead to 
increase in store size which will impact the query performance. 

*Design proposal* 
1. We will introduce a configurable system level property for max 
characters *"carbon.string.allowed.character.count".* If the data crosses 
this limit then min max will not be stored for that column. 
2. If a page does not contain min max for a column, then blocklet min max 
will also not contain the entry for min max of that column. 
3. Thrift file will be modified to introduce a option Boolean flag which 
will used in query to identify whether min max is stored for the filter 
column or not. 
4. As of now it will be supported only for dimensions of string/varchar 
type. We can extend it further to support bigDecimal type measures also in 
future if required. 
5. Block and blocklet dataMap cache will also include storing min max 
Boolean flag for dimensions column based on which filter pruning will be 
done. If min max is not written for any column then isScanRequired will 
return true in driver pruning. 
6. In executor again page and blocklet level min max will be checked for 
filter column. If min max is not written then complete page data will be 
scanned. 

*Backward compatibility* 
1. For stores prior to 1.5.0 min max flag for all the columns will be set 
to true during loading dataMap in query flow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)