Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[jira] [Updated] (CARBONDATA-2110) option of TempCsv should be removed since the default delimiter may conflicts with field value

Classic

List

Threaded

1 message

Akash R Nilugal (Jira)

[jira] [Updated] (CARBONDATA-2110) option of TempCsv should be removed since the default delimiter may conflicts with field value

[ https://issues.apache.org/jira/browse/CARBONDATA-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

xuchuanyin updated CARBONDATA-2110:
-----------------------------------
Description:
Currently in carbondata, an option named ‘tempCSV’ is available during loading dataframe.

After enabling this option, Carbondata will write the dataframe to a *standard* csv file at first and then load the data files.

The delimiters of the standard csv file, such as field delimiter / escape char/ quote char/ multi-line/ line separator and so on may conflict with the actual field value. For example, if a field contains ',', then it will cause problem in further data loading if we save the tempCSV using ',' as field separator.

Since we are not sure about the content of dataframe, I think it's better to deprecate this option. To make forward compatible, user can still use this option but will get warning about it.

was:
Currently in carbondata, an option named ‘tempCSV’ is available during loading dataframe.

After enabling this option, Carbondata will write the dataframe to a *standard* csv file at first and then load the data files.

The delimiters of the standard csv file, such as field delimiter / escape char/ quote char/ multi-line/ line separator and so on may conflict with the actual field value. For example, if a field contains ',', then it will cause problem to save the tempCSV using ',' as field separator.

So I think it's better to deprecate this option. To make forward compatible, user can still use this option but will get warning about it.

> option of TempCsv should be removed since the default delimiter may conflicts with field value
> ----------------------------------------------------------------------------------------------
>
> Key: CARBONDATA-2110
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2110
> Project: CarbonData
> Issue Type: Bug
> Components: data-load
> Reporter: xuchuanyin
> Priority: Major
>
> Currently in carbondata, an option named ‘tempCSV’ is available during loading dataframe.
>
> After enabling this option, Carbondata will write the dataframe to a *standard* csv file at first and then load the data files.
>
> The delimiters of the standard csv file, such as field delimiter / escape char/ quote char/ multi-line/ line separator and so on may conflict with the actual field value. For example, if a field contains ',', then it will cause problem in further data loading if we save the tempCSV using ',' as field separator.
>
> Since we are not sure about the content of dataframe, I think it's better to deprecate this option. To make forward compatible, user can still use this option but will get warning about it.

--
This message was sent by Atlassian JIRA
(v7.6.3#76005)