Apache CarbonData Dev Mailing List archive - Re: [Discussion] Improve the reading/writing performance on the big tablestatus file

Apache CarbonData Dev Mailing List archive

Re: [Discussion] Improve the reading/writing performance on the big tablestatus file

Posted by akashrn5 on Sep 04, 2020; 4:49am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Improve-the-reading-writing-performance-on-the-big-tablestatus-file-tp99716p99811.html

Hi David,

After discussing with you its little bit clear, let me just summarize in
some lines

*Goals*
1. reduce the size of status file (which reduces overall size wit some MBs)
2. make table status file less prone to failures, and fast reading during
read

*For the above goals with your solutions*

1. use the compressor, compress the table status file, so that during read
inmemory read happens and
it will faster
2. to make less prone to failure, *+1 for solution3* , which can combined
with little bit of solution2 (for new format of table status and trace
folder structure ) and solution3 of delta file, to make the read and write
separate so that the read will be faster and it will help to avoid failures
in case of reliability.

Suggestion: One more point is to maintain the cache of details after forst
read, instead of reading every time, only once the status-uuid is updated we
can read again, till then we can read from cache, this will help in faster
read and help in our query.

I suggest you to create a *jira and prepare a design document*, there we can
cover many impact areas and *avoid fixing small bugs after implementation.*

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/