Login  Register

[DISCUSS] Change task distribution mechanism

Posted by Jacky Li on Oct 30, 2017; 5:07am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSS-Change-task-distribution-mechanism-tp25153.html

Hi All,

Currently in carbondata spark integration module CarbonScanRDD, carbon is overriding spark task distribution mechanism. This is required in older version of carbon, because in carbon V1 and V2 format the blocklet size in the file is small, by distributing spark task as per number of blocklet it can improve task parallelism.

However, this feature is not required for V3 format, since the blocklet size now is much bigger, so it is not much benefit we can get from this feature and it makes code very complex. Furthermore, it is not good to manipulate even the executor allocation in carbon layer.

So I suggest to remove this feature.

Regards,
Jacky Li