[DISCUSS] Change task distribution mechanism

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Change task distribution mechanism

Jacky Li
Hi All,

Currently in carbondata spark integration module CarbonScanRDD, carbon is overriding spark task distribution mechanism. This is required in older version of carbon, because in carbon V1 and V2 format the blocklet size in the file is small, by distributing spark task as per number of blocklet it can improve task parallelism.

However, this feature is not required for V3 format, since the blocklet size now is much bigger, so it is not much benefit we can get from this feature and it makes code very complex. Furthermore, it is not good to manipulate even the executor allocation in carbon layer.

So I suggest to remove this feature.

Regards,
Jacky Li

Reply | Threaded
Open this post in threaded view
|

回复:[DISCUSS] Change task distribution mechanism

cenyuhai11
+1






Best regards!
Yuhai Cen


在2017年10月30日 13:07,Jacky Li<[hidden email]> 写道:
Hi All,

Currently in carbondata spark integration module CarbonScanRDD, carbon is overriding spark task distribution mechanism. This is required in older version of carbon, because in carbon V1 and V2 format the blocklet size in the file is small, by distributing spark task as per number of blocklet it can improve task parallelism.

However, this feature is not required for V3 format, since the blocklet size now is much bigger, so it is not much benefit we can get from this feature and it makes code very complex. Furthermore, it is not good to manipulate even the executor allocation in carbon layer.

So I suggest to remove this feature.

Regards,
Jacky Li