1. Background
a) load data with FILEHEADER option load data inpath '<path>' into table <carbon_table_name> options('FILEHEADER'='col1,col2,col3') It means we will load the CSV files without the file header. So we need the FILEHEADER option to specify the file header. b) load data without FILEHEADER option load data inpath '<path>' into table <carbon_table_name> It means we will load the CSV files which have the file header. So we will use the file header of the CSV files. 2. Issue When we load the CSV files without file header and the file header is the same with the table schema, we can combine all column to form the file header. So I think It is unnecessary to let user provide the file header. 3. Solution Add HEADER option to load data sql. HEADER option could be true or false. The default value is true. When we load the CSV files without file header and the file header is the same with the table schema, add 'header'='false' to load data sql. please vote, +1: yes, agree to add 'header' option ±0: abstain or no opinion -1: no, veto this action. no need to add 'header' option. Regards David Cai
Best Regards
David Cai |
This post was updated on .
vote for +1
no need to add file header. I think file header is unnecessary. File header can be got from table schema and it is also inconvenient for users to specify file header option when head is complex. Regards. Chenerlu. |
hi erlu,
as you said, you should vote for +1. 'header' option is different with 'fileheader' option.
Best Regards
David Cai |
In reply to this post by David CaiQiang
I propose the loading the CSV files by explicitly give a table schema,while
using a option to ignore csv header if has. -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Add-HEADER-option-to-load-data-sql-tp17080p17179.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com. |
In reply to this post by David CaiQiang
Thanks for correct me.
Have updated my opinion. Regards. Chenerlu. Re: [Discussion] Add HEADER option to load data sql Jul 03, 2017; 11:48pm — by David CaiQiang David CaiQiang hi erlu, as you said, you should vote for +1. 'header' option is different with 'fileheader' option. Best Regards David Cai |
In reply to this post by wangbin
+1
It will be useful when csv file header is same as table schema in that case it's a pain for user to pass all the csv header. But it's depends completely on user scenarios how csv file is getting generated. Regards Kumar Vishal Sent from my iPhone > On 04-Jul-2017, at 06:51, wangbin <[hidden email]> wrote: > > I propose the loading the CSV files by explicitly give a table schema,while > using a option to ignore csv header if has. > > > > -- > View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Add-HEADER-option-to-load-data-sql-tp17080p17179.html > Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.
kumar vishal
|
In reply to this post by wangbin
I agree that user need not provide columns names if no header present in
file and columns order is same as schema order. instead of option header=true, will not cover all the cases of header present, not present, override header etc. I have added added intermediate approach covering all the cases and also taking care of current default values and backward compatibility. csv file without header 1. FILEHEADER="col1,col2,col3", default: IGNORE_FIRST_LINE="FALSE" use given header 2. FILEHEADER="" default: IGNORE_FIRST_LINE="FALSE" use schema order csv file with header 1. None default: IGNORE_FIRST_LINE="FALSE" expects CSV first line as header. 2. FILEHEADER="col1,col2,col3", IGNORE_FIRST_LINE="TRUE" uses explicitly given header, ignoring header from file. 3. FILEHEADER="", IGNORE_FIRST_LINE="TRUE" uses schema order, ignoring header from file. Regards, Ramana On Tue, Jul 4, 2017 at 6:51 AM, wangbin <[hidden email]> wrote: > I propose the loading the CSV files by explicitly give a table schema,while > using a option to ignore csv header if has. > > > > -- > View this message in context: http://apache-carbondata-dev- > mailing-list-archive.1130556.n5.nabble.com/Discussion-Add- > HEADER-option-to-load-data-sql-tp17080p17179.html > Sent from the Apache CarbonData Dev Mailing List archive mailing list > archive at Nabble.com. > |
Free forum by Nabble | Edit this page |