Login  Register

Re: Carbon over-use cluster resources

Posted by Manhua Jiang on Apr 20, 2020; 11:43am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Carbon-over-use-cluster-resources-tp94332p95126.html

Hi Ajantha,
If we think of this problem in the opposite, carbon may waste resources if user do not set the properties correctly.

What about the case when concurrent loading?

So first of all, we need to figure out where and how many the executor services is used. If keeping logic of one node one task, need to keep the overall running threads in a task.

Then, a little thinking:
Is a global executor service possible? That may cause some dependencies in different steps of loading.
Is multiple executor services for each step(or others) of loading possible? Can the specific executor services change size? (like local-sort is done, then most threads work for writing and none for input reading and converting)

BTW, do you know why the cofigurtation "carbon.number.of.cores.while.loading" born ?




On 2020/04/15 13:54:50, Ajantha Bhat <[hidden email]> wrote:

> Hi Manhua,
>
> For only No sort and Local sort, we don't follow spark task launch logic.
> we have our own logic of one node one task. And inside that task we can
> control resource by configuration (carbon.number.of.cores.while.loading)
>
> As you pointed in the above mail, *N * C is controlled by configuration*
> and the default value of C is 2.
> *I see over use cluster problem only if you configure it badly.*
>
> Do you have any suggestion to the change design? Feel free to raise a
> discussion and work on it.
>
> Thanks,
> Ajantha
>
> On Tue, Apr 14, 2020 at 6:06 PM Liang Chen <[hidden email]> wrote:
>
> > OK, thank you feedbacked this issue, let us look into it.
> >
> > Regards
> > Liang
> >
> >
> > Manhua Jiang wrote
> > > Hi All,
> > > Recently, I found carbon over-use cluster resources. Generally the design
> > > of carbon work flow does not act as common spark task which only do one
> > > small work in one thread, but the task has its mind/logic.
> > >
> > > For example,
> > > 1.launch carbon with --num-executors=1 but set
> > > carbon.number.of.cores.while.loading=10;
> > > 2.no_sort table with multi-block input, N Iterator
> > > <CarbonRowBatch>
> > >  for example, carbon will start N tasks in parallel. And in each task the
> > > CarbonFactDataHandlerColumnar has model.getNumberOfCores() (let's say C)
> > > in ProducerPool. Totally launch N*C threads; ==>This is the case makes me
> > > take this as serious problem. To many threads stucks the executor to send
> > > heartbeat and be killed.
> > >
> > > So, the over-use is related to usage of threadpool.
> > >
> > > This would affect the cluster overall resource usage and may lead to
> > wrong
> > > performance results.
> > >
> > > I hope this get your notice while fixing or writing new codes.
> >
> >
> >
> >
> >
> > --
> > Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
>