GitHub user kunal642 opened a pull request:
https://github.com/apache/incubator-carbondata/pull/802 [CARBONDATA-926] Changed max columns from static value to configurable value in Load DDL Changes done:- 1. Earlier the value for max columns that can be read by the csv parser was hard coded. Now the value would be read from the query options and if no option is provided then the default value of 0 would be taken. 2. Added query id in compaction flow for driver statistics. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/incubator-carbondata CARBONDATA-926 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/802.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #802 ---- commit cf854888f2914bec2239b594deb9edefd72c123e Author: kunal642 <[hidden email]> Date: 2017-04-15T08:18:17Z changed max columns from static value to configurable ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/802 Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1635/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user kunal642 commented on the issue:
https://github.com/apache/incubator-carbondata/pull/802 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/802 Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1639/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/802 Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1640/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/802 Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1641/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/802 Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1643/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/802 Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1647/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/802 Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1648/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/802#discussion_r111665089 --- Diff: processing/src/main/java/org/apache/carbondata/processing/csvload/CSVInputFormat.java --- @@ -220,8 +241,16 @@ private CsvParserSettings extractCsvParserSettings(Configuration job) { parserSettings.setIgnoreTrailingWhitespaces(false); parserSettings.setSkipEmptyLines(false); parserSettings.setMaxCharsPerColumn(100000); - // TODO get from csv file. - parserSettings.setMaxColumns(1000); + String maxColumns = job.get(MAX_COLUMNS); + int maxColumnsInt = 0; --- End diff -- default value should be taken and same should be logged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/802#discussion_r111665083 --- Diff: processing/src/main/java/org/apache/carbondata/processing/csvload/CSVInputFormat.java --- @@ -231,6 +260,31 @@ private CsvParserSettings extractCsvParserSettings(Configuration job) { return parserSettings; } + /** + * This method will decide the number of columns to be parsed for a row by Csv parser + * + * @param columnCountInSchema total number of columns in schema + * @return + */ + private int getMaxColumnsForParsing(int columnCountInSchema, int maxColumns) { + int maxNumberOfColumnsForParsing = DEFAULT_MAX_NUMBER_OF_COLUMNS_FOR_PARSING; + if (maxColumns > 0) { + if (columnCountInSchema >= maxColumns) { + maxNumberOfColumnsForParsing = columnCountInSchema + 1; --- End diff -- taking more columns than the configured max value, should not happen. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/802#discussion_r111667378 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala --- @@ -342,7 +345,23 @@ object CommonUtil { + "the same. Input file : " + csvFile) } } - csvColumns } + + def validateMaxColumns(csvHeaders: Array[String], maxColumns: String): Unit = { + if (maxColumns != null) { + try { + val maxColumnsInt = maxColumns.toInt + if( csvHeaders.length > maxColumnsInt) { + sys.error("csv headers should be less than the max columns") --- End diff -- also need to print max Value which is considered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on the issue:
https://github.com/apache/incubator-carbondata/pull/802 Handle all the validation in driver and remove from executor. parameters: csvheadercolumns, maxcolumns(default 2000), Threashold(20000). 1) User configures both csvheadercolumns, maxcolumns, - if csvheadercolumns >= maxcolumns, give error " csvheadercolumns should be less than max columns <value>" - if maxcolumns > threashold, give error "max columns cannot be more than threashold(20,000)" 2) User configures csvheadercolumns - if csvheadercolumns >= maxcolumns(default) maxcolumns = csvheadercolumns+1 - if csvheadercolumns >= threashold, give error " csvheadercolumns should be less than max threashold(20,000)" 3) User configures nothing - if csvheadercolumns >= maxcolumns(default) maxcolumns = csvheadercolumns+1 - if csvheadercolumns >= threashold, give error " csvheadercolumns should be less than max threashold(20,000)" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/802 Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1657/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/802 Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1663/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on the issue:
https://github.com/apache/incubator-carbondata/pull/802 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-carbondata/pull/802 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Free forum by Nabble | Edit this page |