Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[jira] [Commented] (CARBONDATA-260) Equal or lesser value of MAXCOLUMNS option than column count in CSV header results into array index of bound exception

Classic

List

Threaded

1 message

Akash R Nilugal (Jira)

[jira] [Commented] (CARBONDATA-260) Equal or lesser value of MAXCOLUMNS option than column count in CSV header results into array index of bound exception

[ https://issues.apache.org/jira/browse/CARBONDATA-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15506741#comment-15506741 ]

ASF GitHub Bot commented on CARBONDATA-260:
-------------------------------------------

GitHub user manishgupta88 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/180

[CARBONDATA-260] Equal or lesser value of MAXCOLUMNS option than column count in CSV header results into array index of bound exception

Problem: Equal or lesser value of MAXCOLUMNS option than column count in CSV header results into array index of bound exception

Analysis: If column count in CSV header is more or equal to MAXCOLUMNS option value then array index out of bound exception is thrown by the Univocity CSV parser. This is because while parsing the row, parser adds each row to an array and increments the index and after incrementing it performs one more operation using the incremented index value which leads to array index pf bound exception. Code snipped as attached below for CSV parser.

public void valueParsed() {
this.parsedValues[column++] = appender.getAndReset();
this.appender = appenders[column];
}

e.g. In the above code if column value is 7 then array index will be from 0-6 and when column value becomes 6 then in the second line ArrayIndexOutOfBoundException will be thrown as column value will become 7.

Fix: Whenever Column count in CSV header is equal or more than MAXCOLUMNS option value or default value, increment it by 1.

Impact: Data load flow

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/incubator-carbondata maxcolumns_array_indexOfBound

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/180.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #180

----
commit 3f32424e55615c8e45470d5169b817f9f703dc3e
Author: manishgupta88 <[hidden email]>
Date: 2016-09-20T14:21:33Z

Problem: Equal or lesser value of MAXCOLUMNS option than column count in CSV header results into array index of bound exception

Analysis: If column count in CSV header is more or equal to MAXCOLUMNS option value then array index out of bound exception is thrown by the Univocity CSV parser. This is because while parsing the row, parser adds each row to an array and increments the index and after incrementing it performs one more operation using the incremented index value which leads to array index pf bound exception. Code snipped as attached below for CSV parser.

public void valueParsed() {
this.parsedValues[column++] = appender.getAndReset();
this.appender = appenders[column];
}

Fix: Whenever Column count in CSV header is equal or more than MAXCOLUMNS option value or default value, increment it by 1.

Impact: Data load flow

----

> Equal or lesser value of MAXCOLUMNS option than column count in CSV header results into array index of bound exception
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: CARBONDATA-260
> URL: https://issues.apache.org/jira/browse/CARBONDATA-260
> Project: CarbonData
> Issue Type: Bug
> Reporter: Manish Gupta
> Assignee: Manish Gupta
>
> If column count in CSV header is more or equal to MAXCOLUMNS option value then array index out of bound exception is thrown by the Univocity CSV parser. This is because while parsing the row, parser adds each row to an array and increments the index and after incrementing it performs one more operation using the incremented index value which leads to array index pf bound exception
> java.lang.OutOfMemoryError: Java heap space
> at com.univocity.parsers.common.ParserOutput.<init>(ParserOutput.java:86)
> at com.univocity.parsers.common.AbstractParser.<init>(AbstractParser.java:66)
> at com.univocity.parsers.csv.CsvParser.<init>(CsvParser.java:50)
> at org.apache.carbondata.processing.csvreaderstep.UnivocityCsvParser.initialize(UnivocityCsvParser.java:114)
> at org.apache.carbondata.processing.csvreaderstep.CsvInput.doProcessUnivocity(CsvInput.java:427)
> at org.apache.carbondata.processing.csvreaderstep.CsvInput.access$100(CsvInput.java:60)
> at org.apache.carbondata.processing.csvreaderstep.CsvInput$1.call(CsvInput.java:389)

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)