[
https://issues.apache.org/jira/browse/CARBONDATA-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jacky Li resolved CARBONDATA-2018.
----------------------------------
Resolution: Fixed
> Optimization in reading/writing for sort temp row during data loading
> ---------------------------------------------------------------------
>
> Key: CARBONDATA-2018
> URL:
https://issues.apache.org/jira/browse/CARBONDATA-2018> Project: CarbonData
> Issue Type: Improvement
> Components: data-load
> Affects Versions: 1.3.0
> Reporter: xuchuanyin
> Assignee: xuchuanyin
> Priority: Major
> Fix For: 1.4.0
>
> Time Spent: 13h 50m
> Remaining Estimate: 0h
>
> # SCENARIO
> Currently in carbondata data loading, during sort process step, records will be sorted partially and spilled to the disk. And then carbondata will read these records and do merge sort.
> Since sort step is CPU-tense, during writing/reading these records, we can optimize the serialization/deserialization for these rows and reduce CPU consumption in parsing the rows.
> This should enhance the data loading performance.
> # RESOLVE
> We can pick up the un-sorted fields in the row and pack them as bytes array and skip paring them.
> # RESULT
> I've tested it in my cluster and seen about 8% performance gained (74MB/s/Node -> 81MB/s/Node).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)