column auto mapping when loading data from csv file

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

column auto mapping when loading data from csv file

李寅威
Hi all,


  when loading data from a csv file to carbondata table, we have 2 choices to mapping the columns from csv file to carbondata table:


  1. add columns' names at the start of the csv file
  2. declare the column mapping at the data loading script


  shall we add a feature which make an auto mapping in the order of the columns at the csv file and the carbondata table at default, so that users don't have to do the above jobs any more under most of the circumstance.
Reply | Threaded
Open this post in threaded view
|

Re: column auto mapping when loading data from csv file

manishgupta88
Hi Yinwei,

Thanks for this suggestion. From my opinion providing first 2 options
ensures that user is aware about the data he is going to load and column
data mapping.

For the 3rd option suggested by you I think it will be something that we
are taking the decision without intimating the user and we cannot be sure
that this is exactly how user wanted to load the data. So from my opinion
we should let user decide this behavior.

Regards
Manish Gupta

On Mon, Mar 13, 2017 at 7:48 AM, Yinwei Li <[hidden email]> wrote:

> Hi all,
>
>
>   when loading data from a csv file to carbondata table, we have 2 choices
> to mapping the columns from csv file to carbondata table:
>
>
>   1. add columns' names at the start of the csv file
>   2. declare the column mapping at the data loading script
>
>
>   shall we add a feature which make an auto mapping in the order of the
> columns at the csv file and the carbondata table at default, so that users
> don't have to do the above jobs any more under most of the circumstance.
Reply | Threaded
Open this post in threaded view
|

Re: column auto mapping when loading data from csv file

ravipesala
Hi Yinwei,

Even I feel it is little cumbersome to let user forced to add the header to
CSV file or to loading script.

But what Manish said is also true. I think we should come with some new
option in loading script to accept auto mapping of DDL columns and CSV
columns. If user knows that DDL columns and CSV file columns are in same
order then he may mention like below
 LOAD DATA INPATH INTO TABLE OPTIONS('AUTOFILEHEADER'='true')
 when user mention this then it can take all DDL columns as file header.
May be can have more discussion on this option. Please others comment on
it.

Regards,
Ravindra.

On 13 March 2017 at 10:36, manish gupta <[hidden email]> wrote:

> Hi Yinwei,
>
> Thanks for this suggestion. From my opinion providing first 2 options
> ensures that user is aware about the data he is going to load and column
> data mapping.
>
> For the 3rd option suggested by you I think it will be something that we
> are taking the decision without intimating the user and we cannot be sure
> that this is exactly how user wanted to load the data. So from my opinion
> we should let user decide this behavior.
>
> Regards
> Manish Gupta
>
> On Mon, Mar 13, 2017 at 7:48 AM, Yinwei Li <[hidden email]> wrote:
>
> > Hi all,
> >
> >
> >   when loading data from a csv file to carbondata table, we have 2
> choices
> > to mapping the columns from csv file to carbondata table:
> >
> >
> >   1. add columns' names at the start of the csv file
> >   2. declare the column mapping at the data loading script
> >
> >
> >   shall we add a feature which make an auto mapping in the order of the
> > columns at the csv file and the carbondata table at default, so that
> users
> > don't have to do the above jobs any more under most of the circumstance.
>



--
Thanks & Regards,
Ravi
Reply | Threaded
Open this post in threaded view
|

回复: column auto mapping when loading data from csv file

李寅威
It's a good idea to add a new option in loading script.  Any more discussions from others?




------------------ 原始邮件 ------------------
发件人: "Ravindra Pesala";<[hidden email]>;
发送时间: 2017年3月13日(星期一) 下午2:13
收件人: "dev"<[hidden email]>;

主题: Re: column auto mapping when loading data from csv file



Hi Yinwei,

Even I feel it is little cumbersome to let user forced to add the header to
CSV file or to loading script.

But what Manish said is also true. I think we should come with some new
option in loading script to accept auto mapping of DDL columns and CSV
columns. If user knows that DDL columns and CSV file columns are in same
order then he may mention like below
 LOAD DATA INPATH INTO TABLE OPTIONS('AUTOFILEHEADER'='true')
 when user mention this then it can take all DDL columns as file header.
May be can have more discussion on this option. Please others comment on
it.

Regards,
Ravindra.

On 13 March 2017 at 10:36, manish gupta <[hidden email]> wrote:

> Hi Yinwei,
>
> Thanks for this suggestion. From my opinion providing first 2 options
> ensures that user is aware about the data he is going to load and column
> data mapping.
>
> For the 3rd option suggested by you I think it will be something that we
> are taking the decision without intimating the user and we cannot be sure
> that this is exactly how user wanted to load the data. So from my opinion
> we should let user decide this behavior.
>
> Regards
> Manish Gupta
>
> On Mon, Mar 13, 2017 at 7:48 AM, Yinwei Li <[hidden email]> wrote:
>
> > Hi all,
> >
> >
> >   when loading data from a csv file to carbondata table, we have 2
> choices
> > to mapping the columns from csv file to carbondata table:
> >
> >
> >   1. add columns' names at the start of the csv file
> >   2. declare the column mapping at the data loading script
> >
> >
> >   shall we add a feature which make an auto mapping in the order of the
> > columns at the csv file and the carbondata table at default, so that
> users
> > don't have to do the above jobs any more under most of the circumstance.
>



--
Thanks & Regards,
Ravi
Reply | Threaded
Open this post in threaded view
|

Re: column auto mapping when loading data from csv file

David CaiQiang
In reply to this post by ravipesala
Hi Ravindra,
    How about to use 'NOT_AUTOFILEHEADER'='true' as following?
   I think 'AUTOFILEHEADER'='true' should be a default behavior.

   if (load sql contain "FILEHEADER") {
     1. input files shouldn't contain a fileheader
     2. use "FILEHEADER" parameter to load data after passing column check

   } else {

     if (not exists 'NOT_AUTOFILEHEADER' option) {

       1.auto map the first row of input files with table's columns
       if(the first row contain all column names ) {
          2. use first row as the file header to load data
       } else if (the first row contain part of column names) {
          2. stop loading
       } else {
         2. use the origin order of table's columns to load data
       }

     } else {
       1. input files should contain a file header
       2. use first row as the file header to load data after passing column check
    }
  }
Best Regards
David Cai
Reply | Threaded
Open this post in threaded view
|

Re: column auto mapping when loading data from csv file

Jacky Li
In reply to this post by 李寅威
Hi Yinwei,

I am OK with this new feature if there is an option in load script to enable it. So user can explicitly enable it if he wants, and not changing the current 2 choices.

Regards,
Jacky

> 在 2017年3月13日,上午10:18,Yinwei Li <[hidden email]> 写道:
>
> Hi all,
>
>
>  when loading data from a csv file to carbondata table, we have 2 choices to mapping the columns from csv file to carbondata table:
>
>
>  1. add columns' names at the start of the csv file
>  2. declare the column mapping at the data loading script
>
>
>  shall we add a feature which make an auto mapping in the order of the columns at the csv file and the carbondata table at default, so that users don't have to do the above jobs any more under most of the circumstance.