Apache CarbonData Dev Mailing List archive

Data Load performance degrade when number of segment increase

Classic

List

Threaded

2 messages Options

Chin Wei

Nov 08, 2019; 9:07am

Data Load performance degrade when number of segment increase

3 posts

Hi Community,

I notice that when the number of segments increased, the time taken to load
data increase as well.
After checking, whenever we load 1 csv file, it call readLoadMetadata 9
times. For a table with 10,000 segments each readLoadMetadata call took
50ms.

Is there any plan to improve this or any area that I can look at to improve
it.

Regards,
Chin Wei

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Jacky Li-3

Nov 12, 2019; 3:44am

Re: Data Load performance degrade when number of segment increase

12 posts

Hi,

My suggestion is:
1. Reduce the number of call of readTableStatusFile as less as possible in both loading and query.
2. Cache maybe added inside SegmentStatusManager for LoadMetadtaDetails, and cache invalidation should be carefully done, like for case when dropping table.
3. Do compaction to merge small segment periodically in your application, to reduce the number of segments. After compaction, a small number of "compacted" segment entry will be remained in the table status file, and the "compacted" segment entry will be moved to history table status file. Check carbon.invisible.segments.preserve.count in http://carbondata.apache.org/configuration-parameters.html

If you want to work on it, you are welcome to submit JIRA and PRs.

Regards,
Jacky

On 2019/11/08 09:07:29, Chin Wei <[hidden email]> wrote:

> Hi Community,
>
> I notice that when the number of segments increased, the time taken to load
> data increase as well.
> After checking, whenever we load 1 csv file, it call readLoadMetadata 9
> times. For a table with 10,000 segments each readLoadMetadata call took
> 50ms.
>
> Is there any plan to improve this or any area that I can look at to improve
> it.
>
> Regards,
> Chin Wei
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>

... [show rest of quote]