Apache CarbonData Dev Mailing List archive

column auto mapping when loading data from csv file

Classic

List

Threaded

6 messages Options

李寅威

Mar 13, 2017; 2:18am

column auto mapping when loading data from csv file

Hi all,

when loading data from a csv file to carbondata table, we have 2 choices to mapping the columns from csv file to carbondata table:

1. add columns' names at the start of the csv file
2. declare the column mapping at the data loading script

shall we add a feature which make an auto mapping in the order of the columns at the csv file and the carbondata table at default, so that users don't have to do the above jobs any more under most of the circumstance.

manishgupta88

Mar 13, 2017; 5:06am

Re: column auto mapping when loading data from csv file

Hi Yinwei,

Thanks for this suggestion. From my opinion providing first 2 options
ensures that user is aware about the data he is going to load and column
data mapping.

For the 3rd option suggested by you I think it will be something that we
are taking the decision without intimating the user and we cannot be sure
that this is exactly how user wanted to load the data. So from my opinion
we should let user decide this behavior.

Regards
Manish Gupta

On Mon, Mar 13, 2017 at 7:48 AM, Yinwei Li <[hidden email]> wrote:

> Hi all,
>
>
> when loading data from a csv file to carbondata table, we have 2 choices
> to mapping the columns from csv file to carbondata table:
>
>
> 1. add columns' names at the start of the csv file
> 2. declare the column mapping at the data loading script
>
>
> shall we add a feature which make an auto mapping in the order of the
> columns at the csv file and the carbondata table at default, so that users
> don't have to do the above jobs any more under most of the circumstance.

ravipesala

Mar 13, 2017; 5:43am

Re: column auto mapping when loading data from csv file

Hi Yinwei,

Even I feel it is little cumbersome to let user forced to add the header to
CSV file or to loading script.

But what Manish said is also true. I think we should come with some new
option in loading script to accept auto mapping of DDL columns and CSV
columns. If user knows that DDL columns and CSV file columns are in same
order then he may mention like below
LOAD DATA INPATH INTO TABLE OPTIONS('AUTOFILEHEADER'='true')
when user mention this then it can take all DDL columns as file header.
May be can have more discussion on this option. Please others comment on
it.

Regards,
Ravindra.

On 13 March 2017 at 10:36, manish gupta <[hidden email]> wrote:

> Hi Yinwei,
>
> Thanks for this suggestion. From my opinion providing first 2 options
> ensures that user is aware about the data he is going to load and column
> data mapping.
>
> For the 3rd option suggested by you I think it will be something that we
> are taking the decision without intimating the user and we cannot be sure
> that this is exactly how user wanted to load the data. So from my opinion
> we should let user decide this behavior.
>
> Regards
> Manish Gupta
>
> On Mon, Mar 13, 2017 at 7:48 AM, Yinwei Li <[hidden email]> wrote:
>
> > Hi all,
> >
> >
> > when loading data from a csv file to carbondata table, we have 2
> choices
> > to mapping the columns from csv file to carbondata table:
> >
> >
> > 1. add columns' names at the start of the csv file
> > 2. declare the column mapping at the data loading script
> >
> >
> > shall we add a feature which make an auto mapping in the order of the
> > columns at the csv file and the carbondata table at default, so that
> users
> > don't have to do the above jobs any more under most of the circumstance.
>

... [show rest of quote]

--
Thanks & Regards,
Ravi

李寅威

Mar 13, 2017; 7:02am

回复： column auto mapping when loading data from csv file

It's a good idea to add a new option in loading script. Any more discussions from others?

------------------ 原始邮件 ------------------
发件人: "Ravindra Pesala";<[hidden email]>;
发送时间: 2017年3月13日(星期一) 下午2:13
收件人: "dev"<[hidden email]>;

主题: Re: column auto mapping when loading data from csv file

Hi Yinwei,

Even I feel it is little cumbersome to let user forced to add the header to
CSV file or to loading script.

But what Manish said is also true. I think we should come with some new
option in loading script to accept auto mapping of DDL columns and CSV
columns. If user knows that DDL columns and CSV file columns are in same
order then he may mention like below
LOAD DATA INPATH INTO TABLE OPTIONS('AUTOFILEHEADER'='true')
when user mention this then it can take all DDL columns as file header.
May be can have more discussion on this option. Please others comment on
it.

Regards,
Ravindra.

On 13 March 2017 at 10:36, manish gupta <[hidden email]> wrote:

... [show rest of quote]

--
Thanks & Regards,
Ravi

David CaiQiang

Mar 13, 2017; 8:42am

Re: column auto mapping when loading data from csv file

In reply to this post by ravipesala

Hi Ravindra,
How about to use 'NOT_AUTOFILEHEADER'='true' as following?
I think 'AUTOFILEHEADER'='true' should be a default behavior.

if (load sql contain "FILEHEADER") {
1. input files shouldn't contain a fileheader
2. use "FILEHEADER" parameter to load data after passing column check

} else {

if (not exists 'NOT_AUTOFILEHEADER' option) {

1.auto map the first row of input files with table's columns
if(the first row contain all column names ) {
2. use first row as the file header to load data
} else if (the first row contain part of column names) {
2. stop loading
} else {
2. use the origin order of table's columns to load data
}

} else {
1. input files should contain a file header
2. use first row as the file header to load data after passing column check
}
}

Best Regards
David Cai

Jacky Li

Mar 14, 2017; 1:50pm

Re: column auto mapping when loading data from csv file

In reply to this post by 李寅威

Hi Yinwei,

I am OK with this new feature if there is an option in load script to enable it. So user can explicitly enable it if he wants, and not changing the current 2 choices.

Regards,
Jacky

> 在 2017年3月13日，上午10:18，Yinwei Li <[hidden email]> 写道：
>
> Hi all,
>
>
> when loading data from a csv file to carbondata table, we have 2 choices to mapping the columns from csv file to carbondata table:
>
>
> 1. add columns' names at the start of the csv file
> 2. declare the column mapping at the data loading script
>
>
> shall we add a feature which make an auto mapping in the order of the columns at the csv file and the carbondata table at default, so that users don't have to do the above jobs any more under most of the circumstance.