Re: [Discussion] Optimize the Update Performance
Posted by
haomarch on
May 14, 2020; 4:21am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Optimize-the-Update-Performance-tp96001p96017.html
I have serveral ideas to optimize the update performance:
1. Reduce the storage size of tupleId:
The tupleId is too long leading heavily shuffle IO overhead while join
change table with target table.
2. Avoid to convert String to UTF8String in the row processing.
Before write rows into delta files, The convertfrom string to UTFString
hamers some performance
Code: "UTF8String.fromString(row.getString(tupleId))"
3. For DELETE ops in the MergeDataCommand, we shouldn't joint the whole
columns of change table take part in the JOIN ops. Only the "key" column is
needed.
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/