[GitHub] carbondata pull request #1133: [CARBONDATA-1261] Load data sql add 'header' ...

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1133: [CARBONDATA-1261] Load data sql add 'header' ...

qiuchenjian-2
GitHub user QiangCai opened a pull request:

    https://github.com/apache/carbondata/pull/1133

    [CARBONDATA-1261] Load data sql add 'header' option

    When we load the CSV files without file header and the file header is the same with the table schema, add 'header'='false' to load data sql, no need to let user provide the file header.
   
    maillist:
    http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Add-HEADER-option-to-load-data-sql-td17080.html

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/QiangCai/carbondata addheaderoption

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1133.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1133
   
----
commit a065166776d1c9c63c2cd2080265553c61c49846
Author: QiangCai <[hidden email]>
Date:   2017-07-04T04:11:33Z

    add header option

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1133: [CARBONDATA-1261] Load data sql add 'header' option

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1133
 
    Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/308/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1133: [CARBONDATA-1261] Load data sql add 'header' option

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1133
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2894/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1133: [CARBONDATA-1261] Load data sql add 'header' ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1133#discussion_r125681145
 
    --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala ---
    @@ -441,6 +441,38 @@ case class LoadTable(
           val batchSortSizeInMB = options.getOrElse("batch_sort_size_inmb", null)
           val globalSortPartitions = options.getOrElse("global_sort_partitions", null)
           ValidateUtil.validateGlobalSortPartitions(globalSortPartitions)
    +
    +      // if there isn't file header in csv file and load sql doesn't provide FILEHEADER option,
    +      // we should use table schema to generate file header.
    +      val headerOption = options.get("header")
    +      headerOption.isDefined match {
    +        case true =>
    +          // whether the csv file has file header
    +          // the default value is true
    +          val header = try {
    +            headerOption.get.toBoolean
    +          } catch {
    +            case ex: IllegalArgumentException =>
    +              throw new MalformedCarbonCommandException(
    +                "The 'header' option is not correct. " + ex.getMessage)
    +          }
    +          header match {
    +            case true =>
    +              if (fileHeader.nonEmpty) {
    +                throw new MalformedCarbonCommandException(
    +                  "When 'header' option is true, no need 'fileheader' option.")
    --- End diff --
   
    suggest change to "When 'header' option is true, 'fileheader' option is not required"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1133: [CARBONDATA-1261] Load data sql add 'header' ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1133#discussion_r125682625
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithFileHeaderException.scala ---
    @@ -56,6 +56,53 @@ class TestLoadDataWithFileHeaderException extends QueryTest with BeforeAndAfterA
         }
       }
     
    +  test("test load data with header=false, but without fileheader") {
    --- End diff --
   
    I think one more test case can be added, using header=invalid value


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1133: [CARBONDATA-1261] Load data sql add 'header' ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1133#discussion_r125682912
 
    --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala ---
    @@ -441,6 +441,38 @@ case class LoadTable(
           val batchSortSizeInMB = options.getOrElse("batch_sort_size_inmb", null)
           val globalSortPartitions = options.getOrElse("global_sort_partitions", null)
           ValidateUtil.validateGlobalSortPartitions(globalSortPartitions)
    +
    +      // if there isn't file header in csv file and load sql doesn't provide FILEHEADER option,
    +      // we should use table schema to generate file header.
    +      val headerOption = options.get("header")
    +      headerOption.isDefined match {
    --- End diff --
   
    I feel it is better to use if check, since false case is empty


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1133: [CARBONDATA-1261] Load data sql add 'header' ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1133#discussion_r125683302
 
    --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala ---
    @@ -441,6 +441,38 @@ case class LoadTable(
           val batchSortSizeInMB = options.getOrElse("batch_sort_size_inmb", null)
           val globalSortPartitions = options.getOrElse("global_sort_partitions", null)
           ValidateUtil.validateGlobalSortPartitions(globalSortPartitions)
    +
    +      // if there isn't file header in csv file and load sql doesn't provide FILEHEADER option,
    +      // we should use table schema to generate file header.
    +      val headerOption = options.get("header")
    +      headerOption.isDefined match {
    +        case true =>
    +          // whether the csv file has file header
    +          // the default value is true
    +          val header = try {
    +            headerOption.get.toBoolean
    +          } catch {
    +            case ex: IllegalArgumentException =>
    +              throw new MalformedCarbonCommandException(
    +                "The 'header' option is not correct. " + ex.getMessage)
    --- End diff --
   
    suggest to change to "'header' option should be either 'true' or 'false'"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1133: [CARBONDATA-1261] Load data sql add 'header' option

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1133
 
    Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/353/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1133: [CARBONDATA-1261] Load data sql add 'header' option

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1133
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2940/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1133: [CARBONDATA-1261] Load data sql add 'header' option

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user QiangCai commented on the issue:

    https://github.com/apache/carbondata/pull/1133
 
    fixed all comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1133: [CARBONDATA-1261] Load data sql add 'header' option

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1133
 
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1133: [CARBONDATA-1261] Load data sql add 'header' ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user QiangCai closed the pull request at:

    https://github.com/apache/carbondata/pull/1133


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---