[Discussion] Add HEADER option to load data sql

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[Discussion] Add HEADER option to load data sql

David CaiQiang
1. Background

a)  load data with FILEHEADER option
load data inpath '<path>' into table <carbon_table_name> options('FILEHEADER'='col1,col2,col3')

It means we will load the CSV files without the file header. So we need the FILEHEADER option to specify the file header.

b)  load data without FILEHEADER option
load data inpath '<path>' into table <carbon_table_name>

It means we will load the CSV files which have the file header. So we will use the file header of the CSV files.

2. Issue

When we load the CSV files without file header and the file header is the same with the table schema, we can combine all column to form the file header. So I think It is unnecessary to let user provide the file header.

3. Solution

Add HEADER option to load data sql.
HEADER option could be true or false. The default value is true.
When we load the CSV files without file header and the file header is the same with the table schema,  add 'header'='false' to load data sql.

please vote,
+1: yes, agree to add 'header' option
±0: abstain or no opinion
-1: no,  veto this action. no need to add 'header' option.

Regards
David Cai
Best Regards
David Cai
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Add HEADER option to load data sql

Erlu Chen
This post was updated on .
vote for +1

no need to add file header.

I think file header is unnecessary. File header can be got from table schema and it is also inconvenient for users to specify file header option when head is complex.


Regards.
Chenerlu.
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Add HEADER option to load data sql

David CaiQiang
hi erlu,
    as you said, you should vote for +1.
    'header' option is different with 'fileheader' option.
Best Regards
David Cai
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Add HEADER option to load data sql

wangbin
In reply to this post by David CaiQiang
I propose the loading the CSV files by explicitly give a table schema,while
using a option to ignore csv header if has.



--
View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Add-HEADER-option-to-load-data-sql-tp17080p17179.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Add HEADER option to load data sql

Erlu Chen
In reply to this post by David CaiQiang
Thanks for correct me.

Have updated my opinion.

Regards.
Chenerlu.


Re: [Discussion] Add HEADER option to load data sql
Jul 03, 2017; 11:48pm — by  David CaiQiang David CaiQiang
hi erlu,
    as you said, you should vote for +1.
    'header' option is different with 'fileheader' option.
Best Regards
David Cai
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Add HEADER option to load data sql

kumarvishal09
In reply to this post by wangbin
+1
It will be useful when csv file header is same as table schema in that case it's a pain for user to pass all the csv header. But it's depends completely on user scenarios how csv file is getting generated.

Regards
Kumar Vishal

Sent from my iPhone

> On 04-Jul-2017, at 06:51, wangbin <[hidden email]> wrote:
>
> I propose the loading the CSV files by explicitly give a table schema,while
> using a option to ignore csv header if has.
>
>
>
> --
> View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Add-HEADER-option-to-load-data-sql-tp17080p17179.html
> Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.
kumar vishal
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Add HEADER option to load data sql

Venkata Gollamudi
In reply to this post by wangbin
I agree that user need not provide columns names if no header present in
file and columns order is same as schema order.

instead of option header=true, will not cover all the cases of header
present, not present, override header etc. I have added added intermediate
approach covering all the cases and also taking care of current default
values and backward compatibility.

csv file without header
1. FILEHEADER="col1,col2,col3",  default: IGNORE_FIRST_LINE="FALSE"
use given header
2. FILEHEADER=""                         default: IGNORE_FIRST_LINE="FALSE"
use schema order

csv file with header
1. None                                            default:
IGNORE_FIRST_LINE="FALSE"
 expects CSV first line as header.
2. FILEHEADER="col1,col2,col3",              IGNORE_FIRST_LINE="TRUE"
uses explicitly given header, ignoring header from file.
3. FILEHEADER="",
 IGNORE_FIRST_LINE="TRUE"
uses schema order, ignoring header from file.

Regards,
Ramana

On Tue, Jul 4, 2017 at 6:51 AM, wangbin <[hidden email]> wrote:

> I propose the loading the CSV files by explicitly give a table schema,while
> using a option to ignore csv header if has.
>
>
>
> --
> View this message in context: http://apache-carbondata-dev-
> mailing-list-archive.1130556.n5.nabble.com/Discussion-Add-
> HEADER-option-to-load-data-sql-tp17080p17179.html
> Sent from the Apache CarbonData Dev Mailing List archive mailing list
> archive at Nabble.com.
>