Re: Improving show segment info

Posted by Jacky Li on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Improving-show-segment-info-tp91874p91934.html



> 2020年2月17日 下午2:00,akashrn5 <[hidden email]> 写道:
>
> Hi,
>
>>> *1. How about creating a "tableName.segmentInfo" child table for each main
>>> table?* user can query this table and easy to support filter, group by. we
>>> just have to finalize the schema of this table.
> We already have many things like index tables, datamap tables, just to store
> this metadata, no need to create any table again,
> maintaining would be difficult. Moreover show segments is not a more often
> query, so better not to go for this.

I agree. Initially I think of the way Ajantha suggested (adding a table to store the segment info), then I realize this work equals to refactoring to store the table status file into a Database, which required more effort and we decided not to do it in current phase.

We can do it like Ajantha suggested after moving the table status file.


>
>>> 2. For each partition to find out which all the segments it is mapped to,
>>> currently we don't store this information anywhere. so, where are you
>>> planning to store it? I don't think we need to calculate it every time.
>
> We have a mapping right already, like table status file contain the load
> name and segment file and corresponding segment file contains the partition
> info with location.

Yes, segment file has partition mapping info.
Now the new problem is, will it be very slow when reading a lot segment file? Suppose there are more than 5000 segment files, how can we execute the SHOW SEGMENTS faster?

>
> Regards,
> Akash R Nilugal
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>