GitHub user QiangCai opened a pull request:
https://github.com/apache/carbondata/pull/1133 [CARBONDATA-1261] Load data sql add 'header' option When we load the CSV files without file header and the file header is the same with the table schema, add 'header'='false' to load data sql, no need to let user provide the file header. maillist: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Add-HEADER-option-to-load-data-sql-td17080.html You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/carbondata addheaderoption Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1133.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1133 ---- commit a065166776d1c9c63c2cd2080265553c61c49846 Author: QiangCai <[hidden email]> Date: 2017-07-04T04:11:33Z add header option ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1133 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/308/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1133 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2894/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1133#discussion_r125681145 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -441,6 +441,38 @@ case class LoadTable( val batchSortSizeInMB = options.getOrElse("batch_sort_size_inmb", null) val globalSortPartitions = options.getOrElse("global_sort_partitions", null) ValidateUtil.validateGlobalSortPartitions(globalSortPartitions) + + // if there isn't file header in csv file and load sql doesn't provide FILEHEADER option, + // we should use table schema to generate file header. + val headerOption = options.get("header") + headerOption.isDefined match { + case true => + // whether the csv file has file header + // the default value is true + val header = try { + headerOption.get.toBoolean + } catch { + case ex: IllegalArgumentException => + throw new MalformedCarbonCommandException( + "The 'header' option is not correct. " + ex.getMessage) + } + header match { + case true => + if (fileHeader.nonEmpty) { + throw new MalformedCarbonCommandException( + "When 'header' option is true, no need 'fileheader' option.") --- End diff -- suggest change to "When 'header' option is true, 'fileheader' option is not required" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1133#discussion_r125682625 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithFileHeaderException.scala --- @@ -56,6 +56,53 @@ class TestLoadDataWithFileHeaderException extends QueryTest with BeforeAndAfterA } } + test("test load data with header=false, but without fileheader") { --- End diff -- I think one more test case can be added, using header=invalid value --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1133#discussion_r125682912 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -441,6 +441,38 @@ case class LoadTable( val batchSortSizeInMB = options.getOrElse("batch_sort_size_inmb", null) val globalSortPartitions = options.getOrElse("global_sort_partitions", null) ValidateUtil.validateGlobalSortPartitions(globalSortPartitions) + + // if there isn't file header in csv file and load sql doesn't provide FILEHEADER option, + // we should use table schema to generate file header. + val headerOption = options.get("header") + headerOption.isDefined match { --- End diff -- I feel it is better to use if check, since false case is empty --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1133#discussion_r125683302 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -441,6 +441,38 @@ case class LoadTable( val batchSortSizeInMB = options.getOrElse("batch_sort_size_inmb", null) val globalSortPartitions = options.getOrElse("global_sort_partitions", null) ValidateUtil.validateGlobalSortPartitions(globalSortPartitions) + + // if there isn't file header in csv file and load sql doesn't provide FILEHEADER option, + // we should use table schema to generate file header. + val headerOption = options.get("header") + headerOption.isDefined match { + case true => + // whether the csv file has file header + // the default value is true + val header = try { + headerOption.get.toBoolean + } catch { + case ex: IllegalArgumentException => + throw new MalformedCarbonCommandException( + "The 'header' option is not correct. " + ex.getMessage) --- End diff -- suggest to change to "'header' option should be either 'true' or 'false'" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1133 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/353/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1133 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2940/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user QiangCai commented on the issue:
https://github.com/apache/carbondata/pull/1133 fixed all comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/carbondata/pull/1133 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user QiangCai closed the pull request at:
https://github.com/apache/carbondata/pull/1133 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Free forum by Nabble | Edit this page |