[jira] [Updated] (CARBONDATA-400) [Bad Records] Load data is fail and displaying the string value in beeline as exception

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (CARBONDATA-400) [Bad Records] Load data is fail and displaying the string value in beeline as exception

Akash R Nilugal (Jira)

     [ https://issues.apache.org/jira/browse/CARBONDATA-400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

MAKAMRAGHUVARDHAN updated CARBONDATA-400:
-----------------------------------------
    Description:
Steps
1. Create table
CREATE TABLE String_test2 (string_col string) STORED BY 'org.apache.carbondata.format';
2. Load the data with parameter 'BAD_RECORDS_ACTION'='FORCE' and csv contains a string value that is out of boundary.

LOAD DATA INPATH 'hdfs://hacluster/Carbon/Priyal/string5.csv' into table String_test2 OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_LOGGER_ENABLE'='TRUE', 'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='string_col');


Actual Result: Load data is failed and displaying the string value in beeline as exception trace.

Expected Result:Should display a correct error message and should  not print the exception trace on the console.

Exception thrown on console is as shown below.
Error: com.univocity.parsers.common.TextParsingException: Error processing input: Length of parsed input (100001) exceeds the maximum number of characters defined in your parser settings (100000).
Hint: Number of characters processed may have exceeded limit of 100000 characters per column. Use settings.setMaxCharsPerColumn(int) to define the maximum number of characters a column can have
Ensure your configuration is correct, with delimiters, quotes and escape sequences that match the input format you are trying to parse
Parser Configuration: CsvParserSettings:
        Column reordering enabled=true
        Empty value=null
        Header extraction enabled=false
        Headers=null
        Ignore leading whitespaces=true
        Ignore trailing whitespaces=true
        Input buffer size=128
        Input reading on separate thread=false
        Line separator detection enabled=false
        Maximum number of characters per column=100000
        Maximum number of columns=20480
        Null value=
        Number of records to read=all
        Parse unescaped quotes=true
        Row processor=none
        Selected fields=none
        Skip empty lines=trueFormat configuration:
        CsvFormat:
                Comment character=#
                Field delimiter=,
                Line separator (normalized)=\n
                Line separator sequence=\n
                Quote character="
                Quote escape character=quote escape
                Quote escape escape character=\0, line=0, char=100002. Content parsed: [hellohowareyouwelcomehellohellohellohellohellohellohellohelloheellooabcdefghijklmnopqrstuvwxyzabcqwertuyioplkjhgfdsazxcvbnmpoiuytrewqasdfghjklmnbvcxzasdghskhdgkhdbkshkjchskdhfssudkdjdudusdjhdshdshsjddshjdkdhgdhdshdhdududushdudududududududududududududududududuudududududududuudududududududududududududududududududududududududududuhellohowareyouwelcomehellohellohellohellohellohellohellohelloheellooabcdefghijklmnopqrstuvwxyzabcqwertuyioplkjhgfdsazxcvbnmpoiuytrewqasdfghjklmnbvcxzasdghskhdgkhdbkshkjchskdhfssudkdjdudusdjhdshdshsjddshjdkdhgdhdshdhdududushdudududududududududududududududududuudududududududuududududududududuu

  was:
Steps
1. Create table
CREATE TABLE String_test2 (string_col string) STORED BY 'org.apache.carbondata.format';
2. Load the data with parameter 'BAD_RECORDS_ACTION'='FORCE' and csv contains a string value that is out of boundary.

LOAD DATA INPATH 'hdfs://hacluster/Carbon/Priyal/string5.csv' into table String_test2 OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_LOGGER_ENABLE'='TRUE', 'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='string_col');

Actual Result: Load data is failed and displaying the string value in beeline as exception trace.
Expected Result:Should display a valid exception.


> [Bad Records] Load data is fail and displaying the string value in beeline as exception
> ---------------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-400
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-400
>             Project: CarbonData
>          Issue Type: Bug
>          Components: data-load
>    Affects Versions: 0.1.0-incubating
>         Environment: 3node cluster
>            Reporter: MAKAMRAGHUVARDHAN
>            Priority: Minor
>
> Steps
> 1. Create table
> CREATE TABLE String_test2 (string_col string) STORED BY 'org.apache.carbondata.format';
> 2. Load the data with parameter 'BAD_RECORDS_ACTION'='FORCE' and csv contains a string value that is out of boundary.
> LOAD DATA INPATH 'hdfs://hacluster/Carbon/Priyal/string5.csv' into table String_test2 OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_LOGGER_ENABLE'='TRUE', 'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='string_col');
> Actual Result: Load data is failed and displaying the string value in beeline as exception trace.
> Expected Result:Should display a correct error message and should  not print the exception trace on the console.
> Exception thrown on console is as shown below.
> Error: com.univocity.parsers.common.TextParsingException: Error processing input: Length of parsed input (100001) exceeds the maximum number of characters defined in your parser settings (100000).
> Hint: Number of characters processed may have exceeded limit of 100000 characters per column. Use settings.setMaxCharsPerColumn(int) to define the maximum number of characters a column can have
> Ensure your configuration is correct, with delimiters, quotes and escape sequences that match the input format you are trying to parse
> Parser Configuration: CsvParserSettings:
>         Column reordering enabled=true
>         Empty value=null
>         Header extraction enabled=false
>         Headers=null
>         Ignore leading whitespaces=true
>         Ignore trailing whitespaces=true
>         Input buffer size=128
>         Input reading on separate thread=false
>         Line separator detection enabled=false
>         Maximum number of characters per column=100000
>         Maximum number of columns=20480
>         Null value=
>         Number of records to read=all
>         Parse unescaped quotes=true
>         Row processor=none
>         Selected fields=none
>         Skip empty lines=trueFormat configuration:
>         CsvFormat:
>                 Comment character=#
>                 Field delimiter=,
>                 Line separator (normalized)=\n
>                 Line separator sequence=\n
>                 Quote character="
>                 Quote escape character=quote escape
>                 Quote escape escape character=\0, line=0, char=100002. Content parsed: [hellohowareyouwelcomehellohellohellohellohellohellohellohelloheellooabcdefghijklmnopqrstuvwxyzabcqwertuyioplkjhgfdsazxcvbnmpoiuytrewqasdfghjklmnbvcxzasdghskhdgkhdbkshkjchskdhfssudkdjdudusdjhdshdshsjddshjdkdhgdhdshdhdududushdudududududududududududududududududuudududududududuudududududududududududududududududududududududududududuhellohowareyouwelcomehellohellohellohellohellohellohellohelloheellooabcdefghijklmnopqrstuvwxyzabcqwertuyioplkjhgfdsazxcvbnmpoiuytrewqasdfghjklmnbvcxzasdghskhdgkhdbkshkjchskdhfssudkdjdudusdjhdshdshsjddshjdkdhgdhdshdhdududushdudududududududududududududududududuudududududududuududududududududuu



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)