Login  Register

carbondata insert job has only one task

classic Classic list List threaded Threaded
2 messages Options Options
Embed post
Permalink
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

carbondata insert job has only one task

陈星宇
9 posts
hi ,


i wrote data into carbondata table from parquet table by spark_sql 'insert into carbondata_table select * from parquet_table', the task number is always only one.
it caused the insert job was very slow .
i tried increase spark.default.parallelism = 1000, but only increase query task.
the parquet files are more than 500.
how can i get better performance when insert into carbondata table.


THANKS
ChenXingYu
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: carbondata insert job has only one task

sraghunandan
66 posts
Hi chenxingyu,
How many executors you are having?
Can you check how many select tasks are fired to query from parquet?
Also got can check number of tasks being created is you do CTAS to hive
table

Regards
Raghu

On Tue, 19 Jun 2018, 5:45 pm 陈星宇, <[hidden email]> wrote:

> hi ,
>
>
> i wrote data into carbondata table from parquet table by spark_sql 'insert
> into carbondata_table select * from parquet_table', the task number is
> always only one.
> it caused the insert job was very slow .
> i tried increase spark.default.parallelism = 1000, but only increase query
> task.
> the parquet files are more than 500.
> how can i get better performance when insert into carbondata table.
>
>
> THANKS
> ChenXingYu