carbondata insert job has only one task

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

carbondata insert job has only one task

陈星宇
hi ,


i wrote data into carbondata table from parquet table by spark_sql 'insert into carbondata_table select * from parquet_table', the task number is always only one.
it caused the insert job was very slow .
i tried increase spark.default.parallelism = 1000, but only increase query task.
the parquet files are more than 500.
how can i get better performance when insert into carbondata table.


THANKS
ChenXingYu
Reply | Threaded
Open this post in threaded view
|

Re: carbondata insert job has only one task

sraghunandan
Hi chenxingyu,
How many executors you are having?
Can you check how many select tasks are fired to query from parquet?
Also got can check number of tasks being created is you do CTAS to hive
table

Regards
Raghu

On Tue, 19 Jun 2018, 5:45 pm 陈星宇, <[hidden email]> wrote:

> hi ,
>
>
> i wrote data into carbondata table from parquet table by spark_sql 'insert
> into carbondata_table select * from parquet_table', the task number is
> always only one.
> it caused the insert job was very slow .
> i tried increase spark.default.parallelism = 1000, but only increase query
> task.
> the parquet files are more than 500.
> how can i get better performance when insert into carbondata table.
>
>
> THANKS
> ChenXingYu