Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[jira] [Created] (CARBONDATA-2018) Optimization in reading/writing for sort temp row during data loading

Classic

List

Threaded

1 message

Akash R Nilugal (Jira)

[jira] [Created] (CARBONDATA-2018) Optimization in reading/writing for sort temp row during data loading

xuchuanyin created CARBONDATA-2018:
--------------------------------------

Summary: Optimization in reading/writing for sort temp row during data loading
Key: CARBONDATA-2018
URL: https://issues.apache.org/jira/browse/CARBONDATA-2018
Project: CarbonData
Issue Type: Improvement
Components: data-load
Affects Versions: 1.3.0
Reporter: xuchuanyin
Assignee: xuchuanyin
Fix For: 1.3.0

# SCENARIO

Currently in carbondata data loading, during sort process step, records will be sorted partially and spilled to the disk. And then carbondata will read these records and do merge sort.

Since sort step is CPU-tense, during writing/reading these records, we can optimize the serialization/deserialization for these rows and reduce CPU consumption in parsing the rows.

This should enhance the data loading performance.

# RESOLVE
We can pick up the un-sorted fields in the row and pack them as bytes array and skip paring them.

# RESULT

I've tested it in my cluster and seen about 8% performance gained (74MB/s/Node -> 81MB/s/Node).

--
This message was sent by Atlassian JIRA
(v6.4.14#64029)