Apache CarbonData Dev Mailing List archive - Re: [Discussion] Improve the reading/writing performance on the big tablestatus file

Apache CarbonData Dev Mailing List archive

Re: [Discussion] Improve the reading/writing performance on the big tablestatus file

Posted by Ajantha Bhat on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Improve-the-reading-writing-performance-on-the-big-tablestatus-file-tp99716p99850.html

Hi David,

a) Compressing table status is good. But need to check the decompression
overhead and how much overall benefit we can get.
b) I suggest we can keep multiple 10MB files (or configurable), then read
it distributed way.
c) Once read all the table status files better to cache them at driver with
multilevel hash map. [first level being status of the segment and second
level is segment id]

Thanks,
Ajantha

On Fri, Sep 4, 2020 at 10:19 AM akashrn5 <[hidden email]> wrote:

> Hi David,
>
> After discussing with you its little bit clear, let me just summarize in
> some lines
>
> *Goals*
> 1. reduce the size of status file (which reduces overall size wit some MBs)
> 2. make table status file less prone to failures, and fast reading during
> read
>
> *For the above goals with your solutions*
>
> 1. use the compressor, compress the table status file, so that during read
> inmemory read happens and
> it will faster
> 2. to make less prone to failure, *+1 for solution3* , which can combined
> with little bit of solution2 (for new format of table status and trace
> folder structure ) and solution3 of delta file, to make the read and write
> separate so that the read will be faster and it will help to avoid failures
> in case of reliability.
>
> Suggestion: One more point is to maintain the cache of details after forst
> read, instead of reading every time, only once the status-uuid is updated
> we
> can read again, till then we can read from cache, this will help in faster
> read and help in our query.
>
> I suggest you to create a *jira and prepare a design document*, there we
> can
> cover many impact areas and *avoid fixing small bugs after implementation.*
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>