Apache CarbonData Dev Mailing List archive

[Discussion] Partition Optimization

Posted by maheshrajus on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Partition-Optimization-tp101998.html

Dear Community,

This mail is regarding partition optimization.

*Current behaviour:* Currently partition column information is storing in
data files after load/insert. When we query for partition data we are
fetching from data files and filling the row.

*Proposed optimization:* In this enhancement the idea is to remove/exclude
partition column information while loading/insert[writing]. it means data
files does not contain any partition column information. When we query for
partition data[readers] fill the partition information with help from
projection partiton columns[pass to BlockExecutionInfo and get it] and
blockId[which has partition column name and value] and fill the row and
return.

*Benefits*:
1) query performance should be faster
2) store size should be less compare to old behavior.

Please have a look *WIP PR[#1]* is raised for the same and we are working
on CI failures currently.

#1 https://github.com/apache/carbondata/pull/3695/

Please provide your valuable inputs and suggestions. Thank you in advance !

Thanks & Regards
-Mahesh Raju Somalaraju
github id: maheshrajus