xuchuanyin created CARBONDATA-2304:
--------------------------------------
Summary: Enhance compaction performance by enabling prefetch
Key: CARBONDATA-2304
URL:
https://issues.apache.org/jira/browse/CARBONDATA-2304 Project: CarbonData
Issue Type: Improvement
Components: data-load
Reporter: xuchuanyin
Assignee: xuchuanyin
During compaction, carbondata will query on the segments and retrieve a row, then it will sort the rows and produce the final carbondata file.
Currently we find the poor performance in retrieving the rows, so adding prefetch for the rows will surely improve the compaction performance.
In my local tests, compacting 4 segments each with 100 thousand rows costs 30s with prefetch and 50s without prefetch.
In my tests in a larger cluster, compacting 6 segments each with 18GB raw data costs 45min with prefetch and 57min without prefetch.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)