[GitHub] [carbondata] jack86596 opened a new pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separ…

classic Classic list List threaded Threaded
49 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
ajantha-bhat commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r400679006
 
 

 ##########
 File path: integration/presto/src/test/scala/org/apache/carbondata/presto/util/CarbonDataStoreCreator.scala
 ##########
 @@ -270,6 +270,7 @@ object CarbonDataStoreCreator {
     CSVInputFormat.setEscapeCharacter(configuration, loadModel.getEscapeChar)
     CSVInputFormat.setHeaderExtractionEnabled(configuration, true)
     CSVInputFormat.setQuoteCharacter(configuration, loadModel.getQuoteChar)
+    CSVInputFormat.setLineSeparator(configuration, loadModel.getLineSeparator)
 
 Review comment:
   If not null, then only set the line separator. Else it may lead to wrong behavior.
   You already did like this in `CommonUtil.scala`
   Follow the same

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r400679823
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/catalyst/CarbonParserUtil.scala
 ##########
 @@ -957,6 +958,16 @@ object CarbonParserUtil {
       }
     }
 
+    // Validate LINE_SEPARATOR length
+    if (options.exists(_._1.equalsIgnoreCase("LINE_SEPARATOR"))) {
+      val line_separator: String = CarbonUtil.unescapeChar(
+        options.get("line_separator").get.head._2)
+      if (line_separator.isEmpty || line_separator.length > 2) {
 
 Review comment:
   do we need to have validations for supported line separators also ?
   because for example, If I configured 'a' , it will not fail now. better to fail.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r400681109
 
 

 ##########
 File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadOptions.scala
 ##########
 @@ -35,6 +36,15 @@ class TestLoadOptions extends QueryTest with BeforeAndAfterAll{
     sql("drop table if exists TestLoadTableOptions")
   }
 
+  override def beforeEach(): Unit = {
+    sql("drop table if exists carriage_return_in_string")
 
 Review comment:
   other previous test cases before this PR, don't need this table. so adding before each is not useful for it.
   Suggest to keep in inside testcase and combine multiple new test cases.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r400682260
 
 

 ##########
 File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadOptions.scala
 ##########
 @@ -77,4 +87,54 @@ class TestLoadOptions extends QueryTest with BeforeAndAfterAll{
       Row(1, "2015/7/23", "ind", "aaa1", "phone197", "ASD69643a", 15000))
   }
 
+  test("test load data with line separator option value as Linux/Unix \"\\n\"") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+         |carriage_return_in_string OPTIONS ('fileheader'='id, name, city', 'line_separator'='\\n')"""
+        .stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2\r", "3"))
+  }
+
+  test("test load data without line separator option") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+           |carriage_return_in_string OPTIONS ('fileheader'='id, name, city')"""
+      .stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2", null))
+  }
+
+  test("test load data with line separator option value as Windows \"\\r\\n\"") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+           |carriage_return_in_string OPTIONS ('fileheader'='id, name, city',
+           |'line_separator'='\\r\\n')""".stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2\r", "3\n4"))
+  }
+
+  test("test load data with line separator option value as any two characters \"ab\"") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
 
 Review comment:
   In this case it should throw an exception ? as it is invalid line separator?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r400683066
 
 

 ##########
 File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadOptions.scala
 ##########
 @@ -77,4 +87,54 @@ class TestLoadOptions extends QueryTest with BeforeAndAfterAll{
       Row(1, "2015/7/23", "ind", "aaa1", "phone197", "ASD69643a", 15000))
   }
 
+  test("test load data with line separator option value as Linux/Unix \"\\n\"") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+         |carriage_return_in_string OPTIONS ('fileheader'='id, name, city', 'line_separator'='\\n')"""
+        .stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2\r", "3"))
+  }
+
+  test("test load data without line separator option") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+           |carriage_return_in_string OPTIONS ('fileheader'='id, name, city')"""
+      .stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2", null))
+  }
+
+  test("test load data with line separator option value as Windows \"\\r\\n\"") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+           |carriage_return_in_string OPTIONS ('fileheader'='id, name, city',
+           |'line_separator'='\\r\\n')""".stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2\r", "3\n4"))
+  }
+
+  test("test load data with line separator option value as any two characters \"ab\"") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+           |carriage_return_in_string OPTIONS ('fileheader'='id, name, city',
+           |'line_separator'='ab')""".stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2\r", "3\n4"))
+  }
+
+  test("test load data with line separator option value as empty") {
+    val exception = intercept[MalformedCarbonCommandException] {
+      sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+           |carriage_return_in_string OPTIONS ('fileheader'='id, name, city',
+           |'line_separator'='')""".stripMargin.replace('\n', ' '));
+    }
+    assert(exception.getMessage.contains("LINE_SEPARATOR can be only one or two characters."))
+  }
+
+  test("test load data with line separator option value as more then two characters" +
 
 Review comment:
   * more **than**

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r400683302
 
 

 ##########
 File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadOptions.scala
 ##########
 @@ -77,4 +87,54 @@ class TestLoadOptions extends QueryTest with BeforeAndAfterAll{
       Row(1, "2015/7/23", "ind", "aaa1", "phone197", "ASD69643a", 15000))
   }
 
+  test("test load data with line separator option value as Linux/Unix \"\\n\"") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+         |carriage_return_in_string OPTIONS ('fileheader'='id, name, city', 'line_separator'='\\n')"""
+        .stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2\r", "3"))
+  }
+
+  test("test load data without line separator option") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+           |carriage_return_in_string OPTIONS ('fileheader'='id, name, city')"""
+      .stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2", null))
+  }
+
+  test("test load data with line separator option value as Windows \"\\r\\n\"") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+           |carriage_return_in_string OPTIONS ('fileheader'='id, name, city',
+           |'line_separator'='\\r\\n')""".stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2\r", "3\n4"))
+  }
+
+  test("test load data with line separator option value as any two characters \"ab\"") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+           |carriage_return_in_string OPTIONS ('fileheader'='id, name, city',
+           |'line_separator'='ab')""".stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2\r", "3\n4"))
+  }
+
+  test("test load data with line separator option value as empty") {
+    val exception = intercept[MalformedCarbonCommandException] {
+      sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+           |carriage_return_in_string OPTIONS ('fileheader'='id, name, city',
+           |'line_separator'='')""".stripMargin.replace('\n', ' '));
+    }
+    assert(exception.getMessage.contains("LINE_SEPARATOR can be only one or two characters."))
+  }
+
+  test("test load data with line separator option value as more then two characters" +
+       " \"\\r\\na\"") {
+    val exception = intercept[MalformedCarbonCommandException] {
+      sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
 
 Review comment:
   combine negative scenario test case inside one testcase

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r400685113
 
 

 ##########
 File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
 ##########
 @@ -256,6 +259,11 @@ public CarbonWriterBuilder withLoadOptions(Map<String, String> options) {
         if (escapeChar.length() > 1 && !CarbonLoaderUtil.isValidEscapeSequence(escapeChar)) {
           throw new IllegalArgumentException("ESCAPECHAR cannot be more than one character.");
         }
+      } else if (entry.getKey().equalsIgnoreCase("line_separator")) {
+        String lineSeparator = CarbonUtil.unescapeChar(entry.getValue());
+        if (lineSeparator.isEmpty() || lineSeparator.length() > 2) {
 
 Review comment:
   same as above comments, need to validate whether line separator is validate line separator or not

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r400685368
 
 

 ##########
 File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
 ##########
 @@ -194,6 +194,7 @@ public CarbonWriterBuilder uniqueIdentifier(long timestamp) {
    *                h. quotechar
    *                i. escapechar
    *                j. fileheader
+   *                k. line_separator
 
 Review comment:
   need to update sdk-guide.md document file also

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r400686177
 
 

 ##########
 File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
 ##########
 @@ -224,7 +226,8 @@ public CarbonWriterBuilder withLoadOptions(Map<String, String> options) {
           !option.equalsIgnoreCase("quotechar") &&
           !option.equalsIgnoreCase("escapechar") &&
           !option.equalsIgnoreCase("binary_decoder") &&
-          !option.equalsIgnoreCase("fileheader")) {
+          !option.equalsIgnoreCase("fileheader") &&
+          !option.equalsIgnoreCase("line_separator")) {
 
 Review comment:
   can you please add one test case for sdk ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r400690367
 
 

 ##########
 File path: integration/presto/src/test/scala/org/apache/carbondata/presto/util/CarbonDataStoreCreator.scala
 ##########
 @@ -270,6 +270,7 @@ object CarbonDataStoreCreator {
     CSVInputFormat.setEscapeCharacter(configuration, loadModel.getEscapeChar)
     CSVInputFormat.setHeaderExtractionEnabled(configuration, true)
     CSVInputFormat.setQuoteCharacter(configuration, loadModel.getQuoteChar)
+    CSVInputFormat.setLineSeparator(configuration, loadModel.getLineSeparator)
 
 Review comment:
   > If not null, then only set the line separator. Else it may lead to wrong behavior.
   > You already did like this in `CommonUtil.scala`
   > Follow the same
   
   `public static void setLineSeparator(Configuration configuration, String lineSeparator) {
       if (lineSeparator != null && !lineSeparator.isEmpty()) {
         configuration.set(LINE_SEPARATOR, lineSeparator);
       }
     }`
   This is done inside the setLineSeparator.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r400690680
 
 

 ##########
 File path: hadoop/src/main/java/org/apache/carbondata/hadoop/testutil/StoreCreator.java
 ##########
 @@ -407,6 +407,7 @@ public static void loadData(CarbonLoadModel loadModel, String storeLocation)
     CSVInputFormat.setNumberOfColumns(
         configuration, String.valueOf(loadModel.getCsvHeaderColumns().length));
     CSVInputFormat.setMaxColumns(configuration, "10");
+    CSVInputFormat.setLineSeparator(configuration, loadModel.getLineSeparator());
 
 Review comment:
   > If not null, then only set the line separator. Else it may lead to wrong behavior.
   > You already did like this in `CommonUtil.scala`
   > Follow the same
   
   `public static void setLineSeparator(Configuration configuration, String lineSeparator) {
       if (lineSeparator != null && !lineSeparator.isEmpty()) {
         configuration.set(LINE_SEPARATOR, lineSeparator);
       }
     }`
   This is done inside the setLineSeparator.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r400692411
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/catalyst/CarbonParserUtil.scala
 ##########
 @@ -957,6 +958,16 @@ object CarbonParserUtil {
       }
     }
 
+    // Validate LINE_SEPARATOR length
+    if (options.exists(_._1.equalsIgnoreCase("LINE_SEPARATOR"))) {
+      val line_separator: String = CarbonUtil.unescapeChar(
+        options.get("line_separator").get.head._2)
+      if (line_separator.isEmpty || line_separator.length > 2) {
 
 Review comment:
   > do we need to have validations for supported line separators also ?
   > because for example, If I configured 'a' , it will not fail now. better to fail.
   
   Univocity parser allow any one or two characters to be the line separator, we can just do the same. User may want use 'a' as line separator.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r400696125
 
 

 ##########
 File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadOptions.scala
 ##########
 @@ -77,4 +87,54 @@ class TestLoadOptions extends QueryTest with BeforeAndAfterAll{
       Row(1, "2015/7/23", "ind", "aaa1", "phone197", "ASD69643a", 15000))
   }
 
+  test("test load data with line separator option value as Linux/Unix \"\\n\"") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+         |carriage_return_in_string OPTIONS ('fileheader'='id, name, city', 'line_separator'='\\n')"""
+        .stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2\r", "3"))
+  }
+
+  test("test load data without line separator option") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+           |carriage_return_in_string OPTIONS ('fileheader'='id, name, city')"""
+      .stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2", null))
+  }
+
+  test("test load data with line separator option value as Windows \"\\r\\n\"") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+           |carriage_return_in_string OPTIONS ('fileheader'='id, name, city',
+           |'line_separator'='\\r\\n')""".stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2\r", "3\n4"))
+  }
+
+  test("test load data with line separator option value as any two characters \"ab\"") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
 
 Review comment:
   > In this case it should throw an exception ? as it is invalid line separator?
   
   Univocity parser allow any one or two characters to be the line separator, we can just do the same. User may want use 'a' as line separator.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r400696454
 
 

 ##########
 File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
 ##########
 @@ -256,6 +259,11 @@ public CarbonWriterBuilder withLoadOptions(Map<String, String> options) {
         if (escapeChar.length() > 1 && !CarbonLoaderUtil.isValidEscapeSequence(escapeChar)) {
           throw new IllegalArgumentException("ESCAPECHAR cannot be more than one character.");
         }
+      } else if (entry.getKey().equalsIgnoreCase("line_separator")) {
+        String lineSeparator = CarbonUtil.unescapeChar(entry.getValue());
+        if (lineSeparator.isEmpty() || lineSeparator.length() > 2) {
 
 Review comment:
   > same as above comments, need to validate whether line separator is validate line separator or not
   
   Univocity parser allow any one or two characters to be the line separator, we can just do the same. User may want use 'a' as line separator.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r400696454
 
 

 ##########
 File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
 ##########
 @@ -256,6 +259,11 @@ public CarbonWriterBuilder withLoadOptions(Map<String, String> options) {
         if (escapeChar.length() > 1 && !CarbonLoaderUtil.isValidEscapeSequence(escapeChar)) {
           throw new IllegalArgumentException("ESCAPECHAR cannot be more than one character.");
         }
+      } else if (entry.getKey().equalsIgnoreCase("line_separator")) {
+        String lineSeparator = CarbonUtil.unescapeChar(entry.getValue());
+        if (lineSeparator.isEmpty() || lineSeparator.length() > 2) {
 
 Review comment:
   > same as above comments, need to validate whether line separator is validate line separator or not
   
   Univocity parser allow any one or two characters to be the line separator, we can just do the same.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r400696125
 
 

 ##########
 File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadOptions.scala
 ##########
 @@ -77,4 +87,54 @@ class TestLoadOptions extends QueryTest with BeforeAndAfterAll{
       Row(1, "2015/7/23", "ind", "aaa1", "phone197", "ASD69643a", 15000))
   }
 
+  test("test load data with line separator option value as Linux/Unix \"\\n\"") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+         |carriage_return_in_string OPTIONS ('fileheader'='id, name, city', 'line_separator'='\\n')"""
+        .stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2\r", "3"))
+  }
+
+  test("test load data without line separator option") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+           |carriage_return_in_string OPTIONS ('fileheader'='id, name, city')"""
+      .stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2", null))
+  }
+
+  test("test load data with line separator option value as Windows \"\\r\\n\"") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+           |carriage_return_in_string OPTIONS ('fileheader'='id, name, city',
+           |'line_separator'='\\r\\n')""".stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2\r", "3\n4"))
+  }
+
+  test("test load data with line separator option value as any two characters \"ab\"") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
 
 Review comment:
   > In this case it should throw an exception ? as it is invalid line separator?
   
   Univocity parser allow any one or two characters to be the line separator, we can just do the same.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r401339616
 
 

 ##########
 File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
 ##########
 @@ -224,7 +226,8 @@ public CarbonWriterBuilder withLoadOptions(Map<String, String> options) {
           !option.equalsIgnoreCase("quotechar") &&
           !option.equalsIgnoreCase("escapechar") &&
           !option.equalsIgnoreCase("binary_decoder") &&
-          !option.equalsIgnoreCase("fileheader")) {
+          !option.equalsIgnoreCase("fileheader") &&
+          !option.equalsIgnoreCase("line_separator")) {
 
 Review comment:
   > can you please add one test case for sdk ?
   
   Since line separator has no usage in sdk, remove the changes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r401339672
 
 

 ##########
 File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
 ##########
 @@ -194,6 +194,7 @@ public CarbonWriterBuilder uniqueIdentifier(long timestamp) {
    *                h. quotechar
    *                i. escapechar
    *                j. fileheader
+   *                k. line_separator
 
 Review comment:
   > need to update sdk-guide.md document file also
   
   Since line separator has no usage in sdk, remove the changes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r401339719
 
 

 ##########
 File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadOptions.scala
 ##########
 @@ -77,4 +87,54 @@ class TestLoadOptions extends QueryTest with BeforeAndAfterAll{
       Row(1, "2015/7/23", "ind", "aaa1", "phone197", "ASD69643a", 15000))
   }
 
+  test("test load data with line separator option value as Linux/Unix \"\\n\"") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+         |carriage_return_in_string OPTIONS ('fileheader'='id, name, city', 'line_separator'='\\n')"""
+        .stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2\r", "3"))
+  }
+
+  test("test load data without line separator option") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+           |carriage_return_in_string OPTIONS ('fileheader'='id, name, city')"""
+      .stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2", null))
+  }
+
+  test("test load data with line separator option value as Windows \"\\r\\n\"") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+           |carriage_return_in_string OPTIONS ('fileheader'='id, name, city',
+           |'line_separator'='\\r\\n')""".stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2\r", "3\n4"))
+  }
+
+  test("test load data with line separator option value as any two characters \"ab\"") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+           |carriage_return_in_string OPTIONS ('fileheader'='id, name, city',
+           |'line_separator'='ab')""".stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2\r", "3\n4"))
+  }
+
+  test("test load data with line separator option value as empty") {
+    val exception = intercept[MalformedCarbonCommandException] {
+      sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+           |carriage_return_in_string OPTIONS ('fileheader'='id, name, city',
+           |'line_separator'='')""".stripMargin.replace('\n', ' '));
+    }
+    assert(exception.getMessage.contains("LINE_SEPARATOR can be only one or two characters."))
+  }
+
+  test("test load data with line separator option value as more then two characters" +
+       " \"\\r\\na\"") {
+    val exception = intercept[MalformedCarbonCommandException] {
+      sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r401340559
 
 

 ##########
 File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadOptions.scala
 ##########
 @@ -77,4 +87,54 @@ class TestLoadOptions extends QueryTest with BeforeAndAfterAll{
       Row(1, "2015/7/23", "ind", "aaa1", "phone197", "ASD69643a", 15000))
   }
 
+  test("test load data with line separator option value as Linux/Unix \"\\n\"") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+         |carriage_return_in_string OPTIONS ('fileheader'='id, name, city', 'line_separator'='\\n')"""
+        .stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2\r", "3"))
+  }
+
+  test("test load data without line separator option") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+           |carriage_return_in_string OPTIONS ('fileheader'='id, name, city')"""
+      .stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2", null))
+  }
+
+  test("test load data with line separator option value as Windows \"\\r\\n\"") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+           |carriage_return_in_string OPTIONS ('fileheader'='id, name, city',
+           |'line_separator'='\\r\\n')""".stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2\r", "3\n4"))
+  }
+
+  test("test load data with line separator option value as any two characters \"ab\"") {
+    sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+           |carriage_return_in_string OPTIONS ('fileheader'='id, name, city',
+           |'line_separator'='ab')""".stripMargin.replace('\n', ' '));
+    checkAnswer(sql("select * from carriage_return_in_string where id = 1"),
+      Row(1, "2\r", "3\n4"))
+  }
+
+  test("test load data with line separator option value as empty") {
+    val exception = intercept[MalformedCarbonCommandException] {
+      sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/carriage_return_in_string.csv' INTO TABLE
+           |carriage_return_in_string OPTIONS ('fileheader'='id, name, city',
+           |'line_separator'='')""".stripMargin.replace('\n', ' '));
+    }
+    assert(exception.getMessage.contains("LINE_SEPARATOR can be only one or two characters."))
+  }
+
+  test("test load data with line separator option value as more then two characters" +
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
123