[Background]
Now the size of one segment metadata entry is about 200 bytes in the tablestatus file. if the table has 1 million segments and the mean size of segments is 1GB(means the table size is 1PB), the size of the tablestatus file will reach 200MB. Any reading/writing operation on this tablestatus file will be costly and has a bad performance. For a concurrent scene, it may be easy to result in reading failure on a tablestatus file which is being modified and writing lock waiting timeout. [Motivation & Goal] Carbon supports the big table which is bigger than 1PB, we should reduce the tablestatus size to improve the performance of reading/writing operation. And better to separate reading/writing to the different tablestatus files to avoid reading a tablestatus file which is being modified. [Modification] There are three solutions as following. solution 1: compress tablestatus file 1) use gzip to compress tablestatus file (200MB -> 20 MB) 2) keep all previous lock mechanism 3) support backward compatibility Read: if magic number (0x1F8B) exists, it will uncompress the tablestatus file at first Write:, compress tablestatus directly. solution 2: Based on solution 1, separate reading and writing to the different tablestatus files. 1) new tablestatus format { "statusFileName":"status-uuid1", "updateStatusFileName":"updatestatus-timestamp1", "historyStatusFileName":"status.history", "segmentMaxId":"1000" } keep it small always, reload this file for each operation 2) add Metadata/tracelog folder store files: status-uuid1,updatestatus-timestamp1, status.history 3) use gzip to compress status-uuid1 file 4) support backword compatibility Read: if it start with "[{", go to old reading flow; if it start with "{", go to the new flow. Write: generate a new status-uuid1 file and updatestatus file, and store name in the tablestatus file 5) clean stale files if the stale files are create before 1 hour (query timeout), we can remove them. loading/compaction/cleanfile can trigger this action. solution 3: Based on solution 2, support tablestatus delta 1) new tablestatus file format { "statusFileName":"status-uuid1", "deltaStatusFileName": "status-uuid2.delta", "updateStatusFileName":"updatestatus-timestamp1", "historyStatusFileName":"status.history", "segmentMaxId":"1000" } 2) tablestatus delta store the recent modification Write: if status file reach 10MB, it starts to write delta file. if delta file reach 1MB, merge delta to status file and set deltaStatusFileName to null. Read: if deltaStatusFileName is not null in the new tablestatus file, need read delta status and combine status file with delta status. please vote for all solutions. ----- Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai |
This post was updated on .
solution3, +1
----- My English name is Sunday -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
My English name is Sunday
|
In reply to this post by David CaiQiang
add solution 4 to separate the status file by segment status
*solution 4:* Based on solution 2, support status.inprogress 1) new tablestatus file format { "statusFileName":"status-uuid1", "inProgressStatusFileName": "status-uuid2.inprogess", "updateStatusFileName":"updatestatus-timestamp1", "historyStatusFileName":"status.history", "segmentMaxId":"1000" } 2) status.inprogess file store the in-progress segment metadata Write: at the begin of loading/compaction, add in-progress segment metadata into status-uuid2.inprogess. at the end, move it to status-uuid1. Read: query read status-uuid1 only. other cases read status-uuid2.inprogess if needed. ----- Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai |
In reply to this post by David CaiQiang
Hi david,
Thanks for starting this discussion, i have some questions and inputs 1. in solution 1, it just plane compression, where we will get the benefit of size, but still we will face, reliability issues in case of concurrency. So can be -1. 2. solution 2 writing, and reading to separate files is pretty good idea, in order to avoid many issues which i mentioned in point 1. You mentioned a new format, what my understanding is, you will have a new file which contains list of all table status like "statusFileName":"status-uuid1","status-uuid2",.. and you store "status-uuid1" files in metadata. Am i right? If I am, then your plan is to read this new format and then go to actual files right? When do you merge all these files, and what is the threshold for these files, i mean to say on what basis you decide you should create new status file? 3. Solution 3: writing a delta file what is the obvious benefit we gonna get? whenever i query, we need to read all the status and decide the valid segments right? I dont think we get any benefit here, correct me if my understanding is wrong. 4. This is better idea to keep in progress in other file, with this we can avoid some unnecessary validations in many operations. But this we need to decide with which solution we need to combine, may be once i get my doubts cleared, i can suggest some. * Suggestion/Idea:* Now we have table status file with so many details, but in all the cases we do not read or required all details, can we have some abstraction layer, or status on top of the actual status with some above optimizations, so that we will read less/only required data especially during query? Regards, Akash -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi Akash
2. new tablestsatus, only store the lastest status file name, not all status files. status file will store all segment metadata (just like old tablestatus) 3. if we have delta file, no need to read status file for each query. only reading delta file is enough if status file not changed. ----- Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai |
Hi David,
After discussing with you its little bit clear, let me just summarize in some lines *Goals* 1. reduce the size of status file (which reduces overall size wit some MBs) 2. make table status file less prone to failures, and fast reading during read *For the above goals with your solutions* 1. use the compressor, compress the table status file, so that during read inmemory read happens and it will faster 2. to make less prone to failure, *+1 for solution3* , which can combined with little bit of solution2 (for new format of table status and trace folder structure ) and solution3 of delta file, to make the read and write separate so that the read will be faster and it will help to avoid failures in case of reliability. Suggestion: One more point is to maintain the cache of details after forst read, instead of reading every time, only once the status-uuid is updated we can read again, till then we can read from cache, this will help in faster read and help in our query. I suggest you to create a *jira and prepare a design document*, there we can cover many impact areas and *avoid fixing small bugs after implementation.* -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi David,
a) Compressing table status is good. But need to check the decompression overhead and how much overall benefit we can get. b) I suggest we can keep multiple 10MB files (or configurable), then read it distributed way. c) Once read all the table status files better to cache them at driver with multilevel hash map. [first level being status of the segment and second level is segment id] Thanks, Ajantha On Fri, Sep 4, 2020 at 10:19 AM akashrn5 <[hidden email]> wrote: > Hi David, > > After discussing with you its little bit clear, let me just summarize in > some lines > > *Goals* > 1. reduce the size of status file (which reduces overall size wit some MBs) > 2. make table status file less prone to failures, and fast reading during > read > > *For the above goals with your solutions* > > 1. use the compressor, compress the table status file, so that during read > inmemory read happens and > it will faster > 2. to make less prone to failure, *+1 for solution3* , which can combined > with little bit of solution2 (for new format of table status and trace > folder structure ) and solution3 of delta file, to make the read and write > separate so that the read will be faster and it will help to avoid failures > in case of reliability. > > Suggestion: One more point is to maintain the cache of details after forst > read, instead of reading every time, only once the status-uuid is updated > we > can read again, till then we can read from cache, this will help in faster > read and help in our query. > > I suggest you to create a *jira and prepare a design document*, there we > can > cover many impact areas and *avoid fixing small bugs after implementation.* > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Free forum by Nabble | Edit this page |