Apache CarbonData Dev Mailing List archive

[Discussion] Add HEADER option to load data sql

Classic

List

Threaded

7 messages Options

David CaiQiang

Jul 03, 2017; 8:32am

[Discussion] Add HEADER option to load data sql

1. Background

a) load data with FILEHEADER option
load data inpath '<path>' into table <carbon_table_name> options('FILEHEADER'='col1,col2,col3')

It means we will load the CSV files without the file header. So we need the FILEHEADER option to specify the file header.

b) load data without FILEHEADER option
load data inpath '<path>' into table <carbon_table_name>

It means we will load the CSV files which have the file header. So we will use the file header of the CSV files.

2. Issue

When we load the CSV files without file header and the file header is the same with the table schema, we can combine all column to form the file header. So I think It is unnecessary to let user provide the file header.

3. Solution

Add HEADER option to load data sql.
HEADER option could be true or false. The default value is true.
When we load the CSV files without file header and the file header is the same with the table schema, add 'header'='false' to load data sql.

please vote,
+1: yes, agree to add 'header' option
±0: abstain or no opinion
-1: no, veto this action. no need to add 'header' option.

Regards
David Cai

Best Regards
David Cai

Erlu Chen

Jul 03, 2017; 1:12pm

Re: [Discussion] Add HEADER option to load data sql

This post was updated on Jul 04, 2017; 12:55am.

vote for +1

no need to add file header.

I think file header is unnecessary. File header can be got from table schema and it is also inconvenient for users to specify file header option when head is complex.

Regards.
Chenerlu.

David CaiQiang

Jul 03, 2017; 3:48pm

Re: [Discussion] Add HEADER option to load data sql

hi erlu,
as you said, you should vote for +1.
'header' option is different with 'fileheader' option.

Best Regards
David Cai

wangbin

Jul 04, 2017; 1:21am

Re: [Discussion] Add HEADER option to load data sql

In reply to this post by David CaiQiang

I propose the loading the CSV files by explicitly give a table schema,while
using a option to ignore csv header if has.

--
View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Add-HEADER-option-to-load-data-sql-tp17080p17179.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.

Erlu Chen

Jul 04, 2017; 1:36am

Re: [Discussion] Add HEADER option to load data sql

In reply to this post by David CaiQiang

Thanks for correct me.

Have updated my opinion.

Regards.
Chenerlu.

Re: [Discussion] Add HEADER option to load data sql
Jul 03, 2017; 11:48pm — by David CaiQiang David CaiQiang
hi erlu,
as you said, you should vote for +1.
'header' option is different with 'fileheader' option.
Best Regards
David Cai

kumarvishal09

Jul 04, 2017; 11:07am

Re: [Discussion] Add HEADER option to load data sql

In reply to this post by wangbin

+1
It will be useful when csv file header is same as table schema in that case it's a pain for user to pass all the csv header. But it's depends completely on user scenarios how csv file is getting generated.

Regards
Kumar Vishal

Sent from my iPhone

> On 04-Jul-2017, at 06:51, wangbin <[hidden email]> wrote:
>
> I propose the loading the CSV files by explicitly give a table schema,while
> using a option to ignore csv header if has.
>
>
>
> --
> View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Add-HEADER-option-to-load-data-sql-tp17080p17179.html
> Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.

kumar vishal

Venkata Gollamudi

Jul 04, 2017; 11:48am

Re: [Discussion] Add HEADER option to load data sql

In reply to this post by wangbin

I agree that user need not provide columns names if no header present in
file and columns order is same as schema order.

instead of option header=true, will not cover all the cases of header
present, not present, override header etc. I have added added intermediate
approach covering all the cases and also taking care of current default
values and backward compatibility.

csv file without header
1. FILEHEADER="col1,col2,col3", default: IGNORE_FIRST_LINE="FALSE"
use given header
2. FILEHEADER="" default: IGNORE_FIRST_LINE="FALSE"
use schema order

csv file with header
1. None default:
IGNORE_FIRST_LINE="FALSE"
expects CSV first line as header.
2. FILEHEADER="col1,col2,col3", IGNORE_FIRST_LINE="TRUE"
uses explicitly given header, ignoring header from file.
3. FILEHEADER="",
IGNORE_FIRST_LINE="TRUE"
uses schema order, ignoring header from file.

Regards,
Ramana

On Tue, Jul 4, 2017 at 6:51 AM, wangbin <[hidden email]> wrote:

> I propose the loading the CSV files by explicitly give a table schema,while
> using a option to ignore csv header if has.
>
>
>
> --
> View this message in context: http://apache-carbondata-dev-
> mailing-list-archive.1130556.n5.nabble.com/Discussion-Add-
> HEADER-option-to-load-data-sql-tp17080p17179.html
> Sent from the Apache CarbonData Dev Mailing List archive mailing list
> archive at Nabble.com.
>