Carbon over-use cluster resources
Posted by Manhua Jiang on Apr 02, 2020; 12:30pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Carbon-over-use-cluster-resources-tp94332.html
Hi All,
Recently, I found carbon over-use cluster resources. Generally the design of carbon work flow does not act as common spark task which only do one small work in one thread, but the task has its mind/logic.
For example,
1.launch carbon with --num-executors=1 but set carbon.number.of.cores.while.loading=10;
2.no_sort table with multi-block input, N Iterator<CarbonRowBatch> for example, carbon will start N tasks in parallel. And in each task the CarbonFactDataHandlerColumnar has model.getNumberOfCores() (let's say C) in ProducerPool. Totally launch N*C threads; ==>This is the case makes me take this as serious problem. To many threads stucks the executor to send heartbeat and be killed.
So, the over-use is related to usage of threadpool.
This would affect the cluster overall resource usage and may lead to wrong performance results.
I hope this get your notice while fixing or writing new codes.