Hi,
My suggestion is:
1. Reduce the number of call of readTableStatusFile as less as possible in both loading and query.
2. Cache maybe added inside SegmentStatusManager for LoadMetadtaDetails, and cache invalidation should be carefully done, like for case when dropping table.
3. Do compaction to merge small segment periodically in your application, to reduce the number of segments. After compaction, a small number of "compacted" segment entry will be remained in the table status file, and the "compacted" segment entry will be moved to history table status file. Check carbon.invisible.segments.preserve.count in
http://carbondata.apache.org/configuration-parameters.htmlIf you want to work on it, you are welcome to submit JIRA and PRs.
Regards,
Jacky
On 2019/11/08 09:07:29, Chin Wei <
[hidden email]> wrote:
> Hi Community,
>
> I notice that when the number of segments increased, the time taken to load
> data increase as well.
> After checking, whenever we load 1 csv file, it call readLoadMetadata 9
> times. For a table with 10,000 segments each readLoadMetadata call took
> 50ms.
>
> Is there any plan to improve this or any area that I can look at to improve
> it.
>
> Regards,
> Chin Wei
>
>
>
> --
> Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/>