[Discussion] Improve the reading/writing performance on the big tablestatus file

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[Discussion] Improve the reading/writing performance on the big tablestatus file

David CaiQiang
[Background]
Now the size of one segment metadata entry is about 200 bytes in the
tablestatus file. if the table has 1 million segments and the mean size of
segments is 1GB(means the table size is 1PB), the size of the tablestatus
file will reach 200MB.

Any reading/writing operation on this tablestatus file will be costly and
has a bad performance.

 For a concurrent scene, it may be easy to result in reading failure on a
tablestatus file which is being modified and writing lock waiting timeout.

[Motivation & Goal]
Carbon supports the big table which is bigger than 1PB, we should reduce the
tablestatus size to improve the performance of reading/writing operation.
And better to separate reading/writing to the different tablestatus files to
avoid reading  a  tablestatus file which is being modified.

[Modification]
There are three solutions as following.

solution 1: compress tablestatus file
  1) use gzip to compress tablestatus file (200MB -> 20 MB)
  2) keep all previous lock mechanism
  3) support backward compatibility
    Read: if magic number (0x1F8B) exists, it will uncompress the
tablestatus file at first
    Write:, compress tablestatus directly.

solution 2: Based on solution 1,  separate reading and writing to the
different tablestatus files.
  1) new tablestatus format
    {
     "statusFileName":"status-uuid1",
     "updateStatusFileName":"updatestatus-timestamp1",
     "historyStatusFileName":"status.history",
     "segmentMaxId":"1000"
    }
    keep it small always, reload this file for each operation

  2) add Metadata/tracelog folder
   store files: status-uuid1,updatestatus-timestamp1, status.history

  3) use gzip to compress status-uuid1 file

  4) support backword compatibility
    Read: if it start with "[{", go to old reading flow; if it start with
"{", go to the new flow.
    Write: generate a new status-uuid1 file and updatestatus file, and store
name in the tablestatus file
 
  5) clean stale files
     if the stale files are create before 1 hour (query timeout), we can
remove them. loading/compaction/cleanfile can trigger this action.

solution 3: Based on solution 2, support tablestatus delta
  1) new tablestatus file format
    {
     "statusFileName":"status-uuid1",
     "deltaStatusFileName": "status-uuid2.delta",
     "updateStatusFileName":"updatestatus-timestamp1",
     "historyStatusFileName":"status.history",
     "segmentMaxId":"1000"
    }
  2) tablestatus delta store the recent modification

    Write: if status file reach 10MB, it starts to write delta file. if
delta file reach 1MB, merge delta to status file and set deltaStatusFileName
to null.

    Read: if deltaStatusFileName is not null in the new tablestatus file,
need read delta status and combine status file with delta status.

please vote for all solutions.



-----
Best Regards
David Cai
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Improve the reading/writing performance on the big tablestatus file

Zhangshunyu
This post was updated on .
solution3,  +1



-----
My English name is Sunday
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
My English name is Sunday
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Improve the reading/writing performance on the big tablestatus file

David CaiQiang
In reply to this post by David CaiQiang
add solution 4 to separate the status file by segment status

*solution 4:*   Based on solution 2, support status.inprogress

  1) new tablestatus file format
    {
     "statusFileName":"status-uuid1",
     "inProgressStatusFileName": "status-uuid2.inprogess",
     "updateStatusFileName":"updatestatus-timestamp1",
     "historyStatusFileName":"status.history",
     "segmentMaxId":"1000"
    }

  2) status.inprogess file store the in-progress segment metadata

    Write: at the begin of loading/compaction, add in-progress segment
metadata into status-uuid2.inprogess. at the end, move it to status-uuid1.

    Read: query read status-uuid1 only.  other cases read
status-uuid2.inprogess if needed.



-----
Best Regards
David Cai
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Improve the reading/writing performance on the big tablestatus file

akashrn5
In reply to this post by David CaiQiang
Hi david,

Thanks for starting this discussion, i have some questions and inputs

1. in solution 1, it just plane compression, where we will get the benefit
of size,
but still we will face, reliability issues in case of concurrency. So can be
-1.

2. solution 2
writing, and reading to separate files is pretty good idea, in order to
avoid many issues
which i mentioned in point 1.
You mentioned a new format, what my understanding is, you will have a new
file which contains list of all
table status like "statusFileName":"status-uuid1","status-uuid2",.. and you
store "status-uuid1" files in metadata.
Am i right?

If I am, then your plan is to read this new format and then go to actual
files right?
When do you merge all these files, and what is the threshold for these
files, i mean to say on what basis you decide
you should create new status file?

3. Solution 3:
writing a delta file what is the obvious benefit we gonna get?
whenever i query, we need to read all the status and decide the valid
segments right?

I dont think we get any benefit here, correct me if my understanding is
wrong.


4. This is better idea to keep in progress in other file, with this we can
avoid some unnecessary validations
in many operations. But this we need to decide with which solution we need
to combine,
may be once i get my doubts cleared, i can suggest some.

*
Suggestion/Idea:* Now we have table status file with so many details, but in
all the cases we do not read or required all
details, can we have some abstraction layer, or status on top of the actual
status with some above optimizations,
so that we will read less/only required data especially during query?

Regards,
Akash



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Improve the reading/writing performance on the big tablestatus file

David CaiQiang
Hi Akash

2.  new tablestsatus, only store the lastest status file name, not all
status files.
   status file will store all segment metadata (just like old tablestatus)

3. if we have delta file, no need to read status file for each query. only
reading delta file is enough if status file not changed.




-----
Best Regards
David Cai
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Improve the reading/writing performance on the big tablestatus file

akashrn5
Hi David,

After discussing with you its little bit clear, let me just summarize in
some lines

*Goals*
1. reduce the size of status file (which reduces overall size wit some MBs)
2. make table status file less prone to failures, and fast reading during
read

*For the above goals with your solutions*

1. use the compressor, compress the table status file, so that during read
inmemory read happens and
it will faster
2. to make less prone to failure, *+1 for solution3* , which can combined
with little bit of solution2 (for new format of table status and trace
folder structure ) and solution3 of delta file, to make the read and write
separate so that the read will be faster and it will help to avoid failures
in case of reliability.

Suggestion: One more point is to maintain the cache of details after forst
read, instead of reading every time, only once the status-uuid is updated we
can read again, till then we can read from cache, this will help in faster
read and help in our query.

I suggest you to create a *jira and prepare a design document*, there we can
cover many impact areas and *avoid fixing small bugs after implementation.*



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Improve the reading/writing performance on the big tablestatus file

Ajantha Bhat
Hi David,

a) Compressing table status is good. But need to check the decompression
overhead and how much overall benefit we can get.
b) I suggest we can keep multiple 10MB files (or configurable), then read
it distributed way.
c) Once read all the table status files better to cache them at driver with
multilevel hash map. [first level being status of the segment and second
level is segment id]

Thanks,
Ajantha

On Fri, Sep 4, 2020 at 10:19 AM akashrn5 <[hidden email]> wrote:

> Hi David,
>
> After discussing with you its little bit clear, let me just summarize in
> some lines
>
> *Goals*
> 1. reduce the size of status file (which reduces overall size wit some MBs)
> 2. make table status file less prone to failures, and fast reading during
> read
>
> *For the above goals with your solutions*
>
> 1. use the compressor, compress the table status file, so that during read
> inmemory read happens and
> it will faster
> 2. to make less prone to failure, *+1 for solution3* , which can combined
> with little bit of solution2 (for new format of table status and trace
> folder structure ) and solution3 of delta file, to make the read and write
> separate so that the read will be faster and it will help to avoid failures
> in case of reliability.
>
> Suggestion: One more point is to maintain the cache of details after forst
> read, instead of reading every time, only once the status-uuid is updated
> we
> can read again, till then we can read from cache, this will help in faster
> read and help in our query.
>
> I suggest you to create a *jira and prepare a design document*, there we
> can
> cover many impact areas and *avoid fixing small bugs after implementation.*
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>