Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #1133: [CARBONDATA-1261] Load data sql add 'header' ...

Classic

List

12 messages Options

Options

[GitHub] carbondata pull request #1133: [CARBONDATA-1261] Load data sql add 'header' ...

GitHub user QiangCai opened a pull request:

https://github.com/apache/carbondata/pull/1133

[CARBONDATA-1261] Load data sql add 'header' option

When we load the CSV files without file header and the file header is the same with the table schema, add 'header'='false' to load data sql, no need to let user provide the file header.

maillist:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Add-HEADER-option-to-load-data-sql-td17080.html

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/QiangCai/carbondata addheaderoption

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1133.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1133

----
commit a065166776d1c9c63c2cd2080265553c61c49846
Author: QiangCai <[hidden email]>
Date: 2017-07-04T04:11:33Z

add header option

----

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1133: [CARBONDATA-1261] Load data sql add 'header' option

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1133

Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/308/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1133: [CARBONDATA-1261] Load data sql add 'header' option

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1133

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2894/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1133: [CARBONDATA-1261] Load data sql add 'header' ...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1133#discussion_r125681145

--- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala ---
@@ -441,6 +441,38 @@ case class LoadTable(
val batchSortSizeInMB = options.getOrElse("batch_sort_size_inmb", null)
val globalSortPartitions = options.getOrElse("global_sort_partitions", null)
ValidateUtil.validateGlobalSortPartitions(globalSortPartitions)
+
+ // if there isn't file header in csv file and load sql doesn't provide FILEHEADER option,
+ // we should use table schema to generate file header.
+ val headerOption = options.get("header")
+ headerOption.isDefined match {
+ case true =>
+ // whether the csv file has file header
+ // the default value is true
+ val header = try {
+ headerOption.get.toBoolean
+ } catch {
+ case ex: IllegalArgumentException =>
+ throw new MalformedCarbonCommandException(
+ "The 'header' option is not correct. " + ex.getMessage)
+ }
+ header match {
+ case true =>
+ if (fileHeader.nonEmpty) {
+ throw new MalformedCarbonCommandException(
+ "When 'header' option is true, no need 'fileheader' option.")
--- End diff --

suggest change to "When 'header' option is true, 'fileheader' option is not required"

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1133: [CARBONDATA-1261] Load data sql add 'header' ...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1133#discussion_r125682625

--- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithFileHeaderException.scala ---
@@ -56,6 +56,53 @@ class TestLoadDataWithFileHeaderException extends QueryTest with BeforeAndAfterA
}
}

+ test("test load data with header=false, but without fileheader") {
--- End diff --

I think one more test case can be added, using header=invalid value

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1133: [CARBONDATA-1261] Load data sql add 'header' ...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1133#discussion_r125682912

--- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala ---
@@ -441,6 +441,38 @@ case class LoadTable(
val batchSortSizeInMB = options.getOrElse("batch_sort_size_inmb", null)
val globalSortPartitions = options.getOrElse("global_sort_partitions", null)
ValidateUtil.validateGlobalSortPartitions(globalSortPartitions)
+
+ // if there isn't file header in csv file and load sql doesn't provide FILEHEADER option,
+ // we should use table schema to generate file header.
+ val headerOption = options.get("header")
+ headerOption.isDefined match {
--- End diff --

I feel it is better to use if check, since false case is empty

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1133: [CARBONDATA-1261] Load data sql add 'header' ...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1133#discussion_r125683302

--- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala ---
@@ -441,6 +441,38 @@ case class LoadTable(
val batchSortSizeInMB = options.getOrElse("batch_sort_size_inmb", null)
val globalSortPartitions = options.getOrElse("global_sort_partitions", null)
ValidateUtil.validateGlobalSortPartitions(globalSortPartitions)
+
+ // if there isn't file header in csv file and load sql doesn't provide FILEHEADER option,
+ // we should use table schema to generate file header.
+ val headerOption = options.get("header")
+ headerOption.isDefined match {
+ case true =>
+ // whether the csv file has file header
+ // the default value is true
+ val header = try {
+ headerOption.get.toBoolean
+ } catch {
+ case ex: IllegalArgumentException =>
+ throw new MalformedCarbonCommandException(
+ "The 'header' option is not correct. " + ex.getMessage)
--- End diff --

suggest to change to "'header' option should be either 'true' or 'false'"

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1133: [CARBONDATA-1261] Load data sql add 'header' option

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1133

Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/353/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1133: [CARBONDATA-1261] Load data sql add 'header' option

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1133

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2940/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1133: [CARBONDATA-1261] Load data sql add 'header' option

In reply to this post by qiuchenjian-2

Github user QiangCai commented on the issue:

https://github.com/apache/carbondata/pull/1133

fixed all comments

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1133: [CARBONDATA-1261] Load data sql add 'header' option

In reply to this post by qiuchenjian-2

Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/1133

LGTM

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1133: [CARBONDATA-1261] Load data sql add 'header' ...

In reply to this post by qiuchenjian-2

Github user QiangCai closed the pull request at:

https://github.com/apache/carbondata/pull/1133

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---