Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[jira] [Updated] (CARBONDATA-400) [Bad Records] Load data is fail and displaying the string value in beeline as exception

Classic

List

Threaded

1 message

Akash R Nilugal (Jira)

[jira] [Updated] (CARBONDATA-400) [Bad Records] Load data is fail and displaying the string value in beeline as exception

[ https://issues.apache.org/jira/browse/CARBONDATA-400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

MAKAMRAGHUVARDHAN updated CARBONDATA-400:
-----------------------------------------
Description:
Steps
1. Create table
CREATE TABLE String_test2 (string_col string) STORED BY 'org.apache.carbondata.format';
2. Load the data with parameter 'BAD_RECORDS_ACTION'='FORCE' and csv contains a string value that is out of boundary.

LOAD DATA INPATH 'hdfs://hacluster/Carbon/Priyal/string5.csv' into table String_test2 OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_LOGGER_ENABLE'='TRUE', 'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='string_col');

Actual Result: Load data is failed and displaying the string value in beeline as exception trace.

Expected Result:Should display a correct error message and should not print the exception trace on the console.

Exception thrown on console is as shown below.
Error: com.univocity.parsers.common.TextParsingException: Error processing input: Length of parsed input (100001) exceeds the maximum number of characters defined in your parser settings (100000).
Hint: Number of characters processed may have exceeded limit of 100000 characters per column. Use settings.setMaxCharsPerColumn(int) to define the maximum number of characters a column can have
Ensure your configuration is correct, with delimiters, quotes and escape sequences that match the input format you are trying to parse
Parser Configuration: CsvParserSettings:
Column reordering enabled=true
Empty value=null
Header extraction enabled=false
Headers=null
Ignore leading whitespaces=true
Ignore trailing whitespaces=true
Input buffer size=128
Input reading on separate thread=false
Line separator detection enabled=false
Maximum number of characters per column=100000
Maximum number of columns=20480
Null value=
Number of records to read=all
Parse unescaped quotes=true
Row processor=none
Selected fields=none
Skip empty lines=trueFormat configuration:
CsvFormat:
Comment character=#
Field delimiter=,
Line separator (normalized)=\n
Line separator sequence=\n
Quote character="
Quote escape character=quote escape
Quote escape escape character=\0, line=0, char=100002. Content parsed: [hellohowareyouwelcomehellohellohellohellohellohellohellohelloheellooabcdefghijklmnopqrstuvwxyzabcqwertuyioplkjhgfdsazxcvbnmpoiuytrewqasdfghjklmnbvcxzasdghskhdgkhdbkshkjchskdhfssudkdjdudusdjhdshdshsjddshjdkdhgdhdshdhdududushdudududududududududududududududududuudududududududuudududududududududududududududududududududududududududuhellohowareyouwelcomehellohellohellohellohellohellohellohelloheellooabcdefghijklmnopqrstuvwxyzabcqwertuyioplkjhgfdsazxcvbnmpoiuytrewqasdfghjklmnbvcxzasdghskhdgkhdbkshkjchskdhfssudkdjdudusdjhdshdshsjddshjdkdhgdhdshdhdududushdudududududududududududududududududuudududududududuududududududududuu

was:
Steps
1. Create table
CREATE TABLE String_test2 (string_col string) STORED BY 'org.apache.carbondata.format';
2. Load the data with parameter 'BAD_RECORDS_ACTION'='FORCE' and csv contains a string value that is out of boundary.

LOAD DATA INPATH 'hdfs://hacluster/Carbon/Priyal/string5.csv' into table String_test2 OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_LOGGER_ENABLE'='TRUE', 'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='string_col');

Actual Result: Load data is failed and displaying the string value in beeline as exception trace.
Expected Result:Should display a valid exception.

> [Bad Records] Load data is fail and displaying the string value in beeline as exception
> ---------------------------------------------------------------------------------------
>
> Key: CARBONDATA-400
> URL: https://issues.apache.org/jira/browse/CARBONDATA-400
> Project: CarbonData
> Issue Type: Bug
> Components: data-load
> Affects Versions: 0.1.0-incubating
> Environment: 3node cluster
> Reporter: MAKAMRAGHUVARDHAN
> Priority: Minor
>
> Steps
> 1. Create table
> CREATE TABLE String_test2 (string_col string) STORED BY 'org.apache.carbondata.format';
> 2. Load the data with parameter 'BAD_RECORDS_ACTION'='FORCE' and csv contains a string value that is out of boundary.
> LOAD DATA INPATH 'hdfs://hacluster/Carbon/Priyal/string5.csv' into table String_test2 OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_LOGGER_ENABLE'='TRUE', 'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='string_col');
> Actual Result: Load data is failed and displaying the string value in beeline as exception trace.
> Expected Result:Should display a correct error message and should not print the exception trace on the console.
> Exception thrown on console is as shown below.
> Error: com.univocity.parsers.common.TextParsingException: Error processing input: Length of parsed input (100001) exceeds the maximum number of characters defined in your parser settings (100000).
> Hint: Number of characters processed may have exceeded limit of 100000 characters per column. Use settings.setMaxCharsPerColumn(int) to define the maximum number of characters a column can have
> Ensure your configuration is correct, with delimiters, quotes and escape sequences that match the input format you are trying to parse
> Parser Configuration: CsvParserSettings:
> Column reordering enabled=true
> Empty value=null
> Header extraction enabled=false
> Headers=null
> Ignore leading whitespaces=true
> Ignore trailing whitespaces=true
> Input buffer size=128
> Input reading on separate thread=false
> Line separator detection enabled=false
> Maximum number of characters per column=100000
> Maximum number of columns=20480
> Null value=
> Number of records to read=all
> Parse unescaped quotes=true
> Row processor=none
> Selected fields=none
> Skip empty lines=trueFormat configuration:
> CsvFormat:
> Comment character=#
> Field delimiter=,
> Line separator (normalized)=\n
> Line separator sequence=\n
> Quote character="
> Quote escape character=quote escape
> Quote escape escape character=\0, line=0, char=100002. Content parsed: [hellohowareyouwelcomehellohellohellohellohellohellohellohelloheellooabcdefghijklmnopqrstuvwxyzabcqwertuyioplkjhgfdsazxcvbnmpoiuytrewqasdfghjklmnbvcxzasdghskhdgkhdbkshkjchskdhfssudkdjdudusdjhdshdshsjddshjdkdhgdhdshdhdududushdudududududududududududududududududuudududududududuudududududududududududududududududududududududududududuhellohowareyouwelcomehellohellohellohellohellohellohellohelloheellooabcdefghijklmnopqrstuvwxyzabcqwertuyioplkjhgfdsazxcvbnmpoiuytrewqasdfghjklmnbvcxzasdghskhdgkhdbkshkjchskdhfssudkdjdudusdjhdshdshsjddshjdkdhgdhdshdhdududushdudududududududududududududududududuudududududududuududududududududuu

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)