Hi all,
when loading data from a csv file to carbondata table, we have 2 choices to mapping the columns from csv file to carbondata table: 1. add columns' names at the start of the csv file 2. declare the column mapping at the data loading script shall we add a feature which make an auto mapping in the order of the columns at the csv file and the carbondata table at default, so that users don't have to do the above jobs any more under most of the circumstance. |
Hi Yinwei,
Thanks for this suggestion. From my opinion providing first 2 options ensures that user is aware about the data he is going to load and column data mapping. For the 3rd option suggested by you I think it will be something that we are taking the decision without intimating the user and we cannot be sure that this is exactly how user wanted to load the data. So from my opinion we should let user decide this behavior. Regards Manish Gupta On Mon, Mar 13, 2017 at 7:48 AM, Yinwei Li <[hidden email]> wrote: > Hi all, > > > when loading data from a csv file to carbondata table, we have 2 choices > to mapping the columns from csv file to carbondata table: > > > 1. add columns' names at the start of the csv file > 2. declare the column mapping at the data loading script > > > shall we add a feature which make an auto mapping in the order of the > columns at the csv file and the carbondata table at default, so that users > don't have to do the above jobs any more under most of the circumstance. |
Hi Yinwei,
Even I feel it is little cumbersome to let user forced to add the header to CSV file or to loading script. But what Manish said is also true. I think we should come with some new option in loading script to accept auto mapping of DDL columns and CSV columns. If user knows that DDL columns and CSV file columns are in same order then he may mention like below LOAD DATA INPATH INTO TABLE OPTIONS('AUTOFILEHEADER'='true') when user mention this then it can take all DDL columns as file header. May be can have more discussion on this option. Please others comment on it. Regards, Ravindra. On 13 March 2017 at 10:36, manish gupta <[hidden email]> wrote: > Hi Yinwei, > > Thanks for this suggestion. From my opinion providing first 2 options > ensures that user is aware about the data he is going to load and column > data mapping. > > For the 3rd option suggested by you I think it will be something that we > are taking the decision without intimating the user and we cannot be sure > that this is exactly how user wanted to load the data. So from my opinion > we should let user decide this behavior. > > Regards > Manish Gupta > > On Mon, Mar 13, 2017 at 7:48 AM, Yinwei Li <[hidden email]> wrote: > > > Hi all, > > > > > > when loading data from a csv file to carbondata table, we have 2 > choices > > to mapping the columns from csv file to carbondata table: > > > > > > 1. add columns' names at the start of the csv file > > 2. declare the column mapping at the data loading script > > > > > > shall we add a feature which make an auto mapping in the order of the > > columns at the csv file and the carbondata table at default, so that > users > > don't have to do the above jobs any more under most of the circumstance. > -- Thanks & Regards, Ravi |
It's a good idea to add a new option in loading script. Any more discussions from others?
------------------ 原始邮件 ------------------ 发件人: "Ravindra Pesala";<[hidden email]>; 发送时间: 2017年3月13日(星期一) 下午2:13 收件人: "dev"<[hidden email]>; 主题: Re: column auto mapping when loading data from csv file Hi Yinwei, Even I feel it is little cumbersome to let user forced to add the header to CSV file or to loading script. But what Manish said is also true. I think we should come with some new option in loading script to accept auto mapping of DDL columns and CSV columns. If user knows that DDL columns and CSV file columns are in same order then he may mention like below LOAD DATA INPATH INTO TABLE OPTIONS('AUTOFILEHEADER'='true') when user mention this then it can take all DDL columns as file header. May be can have more discussion on this option. Please others comment on it. Regards, Ravindra. On 13 March 2017 at 10:36, manish gupta <[hidden email]> wrote: > Hi Yinwei, > > Thanks for this suggestion. From my opinion providing first 2 options > ensures that user is aware about the data he is going to load and column > data mapping. > > For the 3rd option suggested by you I think it will be something that we > are taking the decision without intimating the user and we cannot be sure > that this is exactly how user wanted to load the data. So from my opinion > we should let user decide this behavior. > > Regards > Manish Gupta > > On Mon, Mar 13, 2017 at 7:48 AM, Yinwei Li <[hidden email]> wrote: > > > Hi all, > > > > > > when loading data from a csv file to carbondata table, we have 2 > choices > > to mapping the columns from csv file to carbondata table: > > > > > > 1. add columns' names at the start of the csv file > > 2. declare the column mapping at the data loading script > > > > > > shall we add a feature which make an auto mapping in the order of the > > columns at the csv file and the carbondata table at default, so that > users > > don't have to do the above jobs any more under most of the circumstance. > -- Thanks & Regards, Ravi |
In reply to this post by ravipesala
Hi Ravindra,
How about to use 'NOT_AUTOFILEHEADER'='true' as following? I think 'AUTOFILEHEADER'='true' should be a default behavior. if (load sql contain "FILEHEADER") { 1. input files shouldn't contain a fileheader 2. use "FILEHEADER" parameter to load data after passing column check } else { if (not exists 'NOT_AUTOFILEHEADER' option) { 1.auto map the first row of input files with table's columns if(the first row contain all column names ) { 2. use first row as the file header to load data } else if (the first row contain part of column names) { 2. stop loading } else { 2. use the origin order of table's columns to load data } } else { 1. input files should contain a file header 2. use first row as the file header to load data after passing column check } }
Best Regards
David Cai |
In reply to this post by 李寅威
Hi Yinwei,
I am OK with this new feature if there is an option in load script to enable it. So user can explicitly enable it if he wants, and not changing the current 2 choices. Regards, Jacky > 在 2017年3月13日,上午10:18,Yinwei Li <[hidden email]> 写道: > > Hi all, > > > when loading data from a csv file to carbondata table, we have 2 choices to mapping the columns from csv file to carbondata table: > > > 1. add columns' names at the start of the csv file > 2. declare the column mapping at the data loading script > > > shall we add a feature which make an auto mapping in the order of the columns at the csv file and the carbondata table at default, so that users don't have to do the above jobs any more under most of the circumstance. |
Free forum by Nabble | Edit this page |