[Proposal] Proposal to change default value of two parameters for data loading

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[Proposal] Proposal to change default value of two parameters for data loading

xuchuanyin
Hi, all:

About a year ago, we introduced 'multiple dirs for temp data' to solve disk
hotspot problem in data loading.

This feature enables carbon randomly pick one of the local directories
configured in yarn-local-dirs when it writes any temp files to disk (for
example: sort temp files and fact data files).

For about one years' usage in productive environment, this feature turns out
to be effective and correct. So here I propose to enable the related
parameters by default.

The related parameters are

1. `carbon.use.local.dir` : Currently it is `false` by default, we will turn
it to `true` by default;

2. `carbon.user.multiple.dir` : Currently it is `false` by default, we will
turn it to `true` by default.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Proposal to change default value of two parameters for data loading

xm_zzc
Hi chuanyin:
  +1 for this. One question: these two parameters just support for on-yarn
mode, right? Can it support to config other path instead of /tmp path when
user run app without on-yarn mode?



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Proposal to change default value of two parameters for data loading

xuchuanyin
Yes, it needs further modification to meet the requirement -- an additional
property is needed to handle this, we can configure multiple directories
there.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Proposal to change default value of two parameters for data loading

Jacky Li
In reply to this post by xuchuanyin
+1


> 在 2018年10月15日,下午9:03,xuchuanyin <[hidden email]> 写道:
>
> Hi, all:
>
> About a year ago, we introduced 'multiple dirs for temp data' to solve disk
> hotspot problem in data loading.
>
> This feature enables carbon randomly pick one of the local directories
> configured in yarn-local-dirs when it writes any temp files to disk (for
> example: sort temp files and fact data files).
>
> For about one years' usage in productive environment, this feature turns out
> to be effective and correct. So here I propose to enable the related
> parameters by default.
>
> The related parameters are
>
> 1. `carbon.use.local.dir` : Currently it is `false` by default, we will turn
> it to `true` by default;
>
> 2. `carbon.user.multiple.dir` : Currently it is `false` by default, we will
> turn it to `true` by default.
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>