The size of the tablestatus file is getting larger, does it impact the performance of reading this file?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

The size of the tablestatus file is getting larger, does it impact the performance of reading this file?

xm_zzc
This post was updated on .
Hi dev:
  The size of the tablestatus file is getting larger, does it impact the
performance of reading this file, for example 1 million segment info in this
file? There are many places will scan this file.
  Why not delete the invisible segment info to reduce the size of
tablestatus file? will they be used later? Can we delete the invisible segment in method 'SegmentStatusManager.deleteLoadsAndUpdateMetadata'?



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: The size of the tablestatus file is getting larger, does it impact the performance of reading this file?

Jacky Li
Hi,

Yes, I think you are right. Currently CLEAN FILES command only delete the segment data folder, but not deleting metadata entries in table_status file, I think this is the problem.
Please feel free to open a JIRA ticket and improve it. Thanks.

Regards,
Jacky

> 在 2018年3月14日,上午10:28,xm_zzc <[hidden email]> 写道:
>
> Hi dev:
>  The size of the tablestatus file is getting larger, does it impact the
> performance of reading this file, for example 1 million segment info in this
> file? There are many places will scan this file.
>  Why not delete the invisible segment info to reduce the size of
> tablestatus file? will they be used later?
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/



Reply | Threaded
Open this post in threaded view
|

Re: The size of the tablestatus file is getting larger, does it impact the performance of reading this file?

sraghunandan
Dear Jacky,
It was purposefully done like that.the table status need to give the
history of the transactions that happened on the system.This is like an
audit point.

Dear xm_zzc
what is your use case?

In any case we cannot permanently remove the entries from our system.based
on use case we can consider to move it to a separate file.we can also check
what the size would be and optimising reading it from multiple places.

Regards
Raghu
On Wed, 14 Mar 2018 at 12:18 PM, Jacky Li <[hidden email]> wrote:

> Hi,
>
> Yes, I think you are right. Currently CLEAN FILES command only delete the
> segment data folder, but not deleting metadata entries in table_status
> file, I think this is the problem.
> Please feel free to open a JIRA ticket and improve it. Thanks.
>
> Regards,
> Jacky
>
> > 在 2018年3月14日,上午10:28,xm_zzc <[hidden email]> 写道:
> >
> > Hi dev:
> >  The size of the tablestatus file is getting larger, does it impact the
> > performance of reading this file, for example 1 million segment info in
> this
> > file? There are many places will scan this file.
> >  Why not delete the invisible segment info to reduce the size of
> > tablestatus file? will they be used later?
> >
> >
> > --
> > Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: The size of the tablestatus file is getting larger, does it impact the performance of reading this file?

xm_zzc
Hi Jacky, Raghunandan S:
  Thanks for your reply.
  Currently I am working on PR2045, this pr will automatically delete the
segment lock files when execute method
'SegmentStatusManager.deleteLoadsAndUpdateMetadata', and it will scan
'tablestatus' file to decide which segment lock file need to be deleted.
Ravindra Pesala considers the performance  of reading tablestatus file as
the size of it is getting larger. So I want to know whether it can reduce
the size of tablestatus file.
  According to Raghunandan S's suggestion, I think we can *append* the
invisible segment list to the file called 'tablestatus.history' when execute
command 'CLEAN FILES FOR TABLE' every time, separate  visible and invisible
segments into two files. If later it needs to support listing all
segments(include visible and invisible) list when execute 'SHOW SEGMENTS FOR
TABLE', it just need to read from two files. Is it OK to do so?



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: The size of the tablestatus file is getting larger, does it impact the performance of reading this file?

manishgupta88
I think maintaining a tablestatus backlog file is a good idea. This will
also help us in quick filtering of valid segments as the number of segments
increase during queries execution which involve reading of table status
file.

Show segment DDL can read both the files to display the output.

Regards
Manish Gupta

On Thu, 15 Mar 2018 at 10:19 AM, xm_zzc <[hidden email]> wrote:

> Hi Jacky, Raghunandan S:
>   Thanks for your reply.
>   Currently I am working on PR2045, this pr will automatically delete the
> segment lock files when execute method
> 'SegmentStatusManager.deleteLoadsAndUpdateMetadata', and it will scan
> 'tablestatus' file to decide which segment lock file need to be deleted.
> Ravindra Pesala considers the performance  of reading tablestatus file as
> the size of it is getting larger. So I want to know whether it can reduce
> the size of tablestatus file.
>   According to Raghunandan S's suggestion, I think we can *append* the
> invisible segment list to the file called 'tablestatus.history' when
> execute
> command 'CLEAN FILES FOR TABLE' every time, separate  visible and invisible
> segments into two files. If later it needs to support listing all
> segments(include visible and invisible) list when execute 'SHOW SEGMENTS
> FOR
> TABLE', it just need to read from two files. Is it OK to do so?
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: The size of the tablestatus file is getting larger, does it impact the performance of reading this file?

Jacky Li
Hi,

I think this approach (maitaining a history tablestatus file) is good.
Xm_zzc, please continue with this approach if you want to work on it.

Regards,
Jacky

> 在 2018年3月15日,下午1:47,manish gupta <[hidden email]> 写道:
>
> I think maintaining a tablestatus backlog file is a good idea. This will
> also help us in quick filtering of valid segments as the number of segments
> increase during queries execution which involve reading of table status
> file.
>
> Show segment DDL can read both the files to display the output.
>
> Regards
> Manish Gupta
>
> On Thu, 15 Mar 2018 at 10:19 AM, xm_zzc <[hidden email]> wrote:
>
>> Hi Jacky, Raghunandan S:
>>  Thanks for your reply.
>>  Currently I am working on PR2045, this pr will automatically delete the
>> segment lock files when execute method
>> 'SegmentStatusManager.deleteLoadsAndUpdateMetadata', and it will scan
>> 'tablestatus' file to decide which segment lock file need to be deleted.
>> Ravindra Pesala considers the performance  of reading tablestatus file as
>> the size of it is getting larger. So I want to know whether it can reduce
>> the size of tablestatus file.
>>  According to Raghunandan S's suggestion, I think we can *append* the
>> invisible segment list to the file called 'tablestatus.history' when
>> execute
>> command 'CLEAN FILES FOR TABLE' every time, separate  visible and invisible
>> segments into two files. If later it needs to support listing all
>> segments(include visible and invisible) list when execute 'SHOW SEGMENTS
>> FOR
>> TABLE', it just need to read from two files. Is it OK to do so?
>>
>>
>>
>> --
>> Sent from:
>> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>>