[Discussion] Partition Optimization
Posted by
maheshrajus on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Partition-Optimization-tp101998.html
Dear Community,
This mail is regarding partition optimization.
*Current behaviour:* Currently partition column information is storing in
data files after load/insert. When we query for partition data we are
fetching from data files and filling the row.
*Proposed optimization:* In this enhancement the idea is to remove/exclude
partition column information while loading/insert[writing]. it means data
files does not contain any partition column information. When we query for
partition data[readers] fill the partition information with help from
projection partiton columns[pass to BlockExecutionInfo and get it] and
blockId[which has partition column name and value] and fill the row and
return.
*Benefits*:
1) query performance should be faster
2) store size should be less compare to old behavior.
Please have a look *WIP PR[#1]* is raised for the same and we are working
on CI failures currently.
#1
https://github.com/apache/carbondata/pull/3695/Please provide your valuable inputs and suggestions. Thank you in advance !
Thanks & Regards
-Mahesh Raju Somalaraju
github id: maheshrajus