jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r401340621 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadOptions.scala ########## @@ -35,6 +36,15 @@ class TestLoadOptions extends QueryTest with BeforeAndAfterAll{ sql("drop table if exists TestLoadTableOptions") } + override def beforeEach(): Unit = { + sql("drop table if exists carriage_return_in_string") Review comment: > other previous test cases before this PR, don't need this table. so adding before each is not useful for it. > Suggest to keep in inside testcase and combine multiple new test cases. done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r401342584 ########## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ########## @@ -256,6 +259,11 @@ public CarbonWriterBuilder withLoadOptions(Map<String, String> options) { if (escapeChar.length() > 1 && !CarbonLoaderUtil.isValidEscapeSequence(escapeChar)) { throw new IllegalArgumentException("ESCAPECHAR cannot be more than one character."); } + } else if (entry.getKey().equalsIgnoreCase("line_separator")) { + String lineSeparator = CarbonUtil.unescapeChar(entry.getValue()); + if (lineSeparator.isEmpty() || lineSeparator.length() > 2) { Review comment: > > same as above comments, need to validate whether line separator is validate line separator or not > > Univocity parser allow any one or two characters to be the line separator, we can just do the same. Since line separator has no usage in sdk, remove the changes. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r401342888 ########## File path: integration/presto/src/test/scala/org/apache/carbondata/presto/util/CarbonDataStoreCreator.scala ########## @@ -270,6 +270,7 @@ object CarbonDataStoreCreator { CSVInputFormat.setEscapeCharacter(configuration, loadModel.getEscapeChar) CSVInputFormat.setHeaderExtractionEnabled(configuration, true) CSVInputFormat.setQuoteCharacter(configuration, loadModel.getQuoteChar) + CSVInputFormat.setLineSeparator(configuration, loadModel.getLineSeparator) Review comment: > If not null, then only set the line separator. Else it may lead to wrong behavior. > You already did like this in `CommonUtil.scala` > Follow the same Because I am not familar with presto, so please help to check whether here my modification is correct or not. Thanks. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r401344118 ########## File path: integration/presto/src/test/scala/org/apache/carbondata/presto/util/CarbonDataStoreCreator.scala ########## @@ -270,6 +270,7 @@ object CarbonDataStoreCreator { CSVInputFormat.setEscapeCharacter(configuration, loadModel.getEscapeChar) CSVInputFormat.setHeaderExtractionEnabled(configuration, true) CSVInputFormat.setQuoteCharacter(configuration, loadModel.getQuoteChar) + CSVInputFormat.setLineSeparator(configuration, loadModel.getLineSeparator) Review comment: It is not a presto production code. This changes are in test files. Doesn't matter. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r401344169 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/catalyst/CarbonParserUtil.scala ########## @@ -957,6 +958,16 @@ object CarbonParserUtil { } } + // Validate LINE_SEPARATOR length + if (options.exists(_._1.equalsIgnoreCase("LINE_SEPARATOR"))) { + val line_separator: String = CarbonUtil.unescapeChar( + options.get("line_separator").get.head._2) + if (line_separator.isEmpty || line_separator.length > 2) { Review comment: ok ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#issuecomment-607048726 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/891/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#issuecomment-607055846 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2599/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#issuecomment-607073170 LGTM. Thanks for the contribution. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
asfgit closed pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
Free forum by Nabble | Edit this page |