Manish Gupta created CARBONDATA-2381:
----------------------------------------
Summary: Improve compaction performance by filling batch result in columnar format and performing IO at blocklet level
Key: CARBONDATA-2381
URL:
https://issues.apache.org/jira/browse/CARBONDATA-2381 Project: CarbonData
Issue Type: Improvement
Affects Versions: 1.3.1
Reporter: Manish Gupta
Assignee: Manish Gupta
Problem: Compaction performance is slow as compared to data load. If compaction threshold is set to 6,6 then on minor compaction after 6 loads compaction performance is almost 6-7 times of the total load performance for 6 loads.
Analysis:
# During compaction result filling is done in row format. Due to this as the number of columns increases the dimension and measure data filling time increases. This happens because in row filling we are not able to take advantage of OS cacheable buffers as we continuously read data for next column.
# As compaction uses a page level reader flow wherein both IO and uncompression is done at page level, the IO and uncompression time increases in this model.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)