[GitHub] [carbondata] jack86596 opened a new pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separ…

classic Classic list List threaded Threaded
49 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 opened a new pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separ…

GitBox
jack86596 opened a new pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separ…
URL: https://github.com/apache/carbondata/pull/3664
 
 
   …ator during csv parsing.
   
    ### Why is this PR needed?
    Sometime univocity parser will detect the line separator incorrectly. In this case, user should be able to set line separator explicitly.
   
    ### What changes were proposed in this PR?
   
       
    ### Does this PR introduce any user interface change?
     - Yes. New load option "line_separator" is added.
   
    ### Is any new testcase added?
     - Yes.
   
       
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Zhangshunyu commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separ…

GitBox
Zhangshunyu commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separ…
URL: https://github.com/apache/carbondata/pull/3664#discussion_r390822484
 
 

 ##########
 File path: integration/spark/src/test/resources/carriagereturninstring.csv
 ##########
 @@ -0,0 +1,2 @@
+1,2
,3
+4,5,6
 
 Review comment:
   suggest to name the csv file like 'carriage_return_in_string.csv' as the name so long without under score is confusing

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separ…

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separ…
URL: https://github.com/apache/carbondata/pull/3664#issuecomment-597531968
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/712/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separ…

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separ…
URL: https://github.com/apache/carbondata/pull/3664#issuecomment-597534213
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2419/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separ…

GitBox
In reply to this post by GitBox
jack86596 commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separ…
URL: https://github.com/apache/carbondata/pull/3664#discussion_r390883053
 
 

 ##########
 File path: integration/spark/src/test/resources/carriagereturninstring.csv
 ##########
 @@ -0,0 +1,2 @@
+1,2
,3
+4,5,6
 
 Review comment:
   Done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separ…

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separ…
URL: https://github.com/apache/carbondata/pull/3664#issuecomment-597598349
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2426/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separ…

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separ…
URL: https://github.com/apache/carbondata/pull/3664#issuecomment-597600368
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/719/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separ…

GitBox
In reply to this post by GitBox
akashrn5 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separ…
URL: https://github.com/apache/carbondata/pull/3664#issuecomment-603170194
 
 
   @jack86596 please update the PR heading and description in a proper format

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separ…

GitBox
In reply to this post by GitBox
akashrn5 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separ…
URL: https://github.com/apache/carbondata/pull/3664#issuecomment-603170494
 
 
   can you please describe the actual error you got, when you mean `Sometime univocity parser will detect the line separator incorrectly`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
jack86596 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#issuecomment-603794704
 
 
   > @jack86596 please update the PR heading and description in a proper format
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 closed pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
jack86596 closed pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 opened a new pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
jack86596 opened a new pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664
 
 
    ### Why is this PR needed?
    Sometime univocity parser will detect the line separator incorrectly. In this case, user should be able to set line separator explicitly.
    Issue: During loading, if in the first line, there is one field has a '\r' character and this '\r' appears before the first '\n', line separator detection will treat '\r' as line separator. This is not the intention.
    Example:
    Data file has two line, ^M is '\r':
    1,2^M,3
    4,5,6
    After loading,
    The records in table will be:
    | 1 | 2 | null |
    | null | 3
    4 | 5 |
    Correct should be:
     | 1 | 2^M | 3 |
     | 4 | 5 | 6 |
   
    ### What changes were proposed in this PR?
    Allow user to specify line separator explicitly in load command, add one new option to load command named "line_separator".
       
    ### Does this PR introduce any user interface change?
     - Yes. New load option "line_separator" is added.
   
    ### Is any new testcase added?
     - Yes.
   
       
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 removed a comment on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
jack86596 removed a comment on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#issuecomment-603794704
 
 
   > @jack86596 please update the PR heading and description in a proper format
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
jack86596 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#issuecomment-603797452
 
 
   @akashrn5 Please review, thanks.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
akashrn5 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#issuecomment-606089836
 
 
   @jack86596 is the behavior same for hive tables also, please check, they also use same parser, please how they are handling this once, then we can decide for carbon is the better way or do we have some other way. Thanks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 edited a comment on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
akashrn5 edited a comment on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#issuecomment-606089836
 
 
   @jack86596 is the behavior same for hive tables also, please check, they also use same parser, please check how they are handling this case, then we can decide for carbon is the better way or do we have some other way. Thanks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
jack86596 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#issuecomment-606360527
 
 
   @akashrn5 hive table doesn't have parsing logic during loading, loading for hive table is just move the file to the table folder directly. So will not use univocity parser. Please correct me if I am not correct. Thanks.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
akashrn5 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#issuecomment-606419073
 
 
   when i did select * on the table, i cannot see 1 and 2 values, looks strange, can you check
   ![image](https://user-images.githubusercontent.com/21074821/77992301-728bf780-7343-11ea-9e72-a49a882c9f12.png)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jack86596 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
jack86596 commented on issue #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#issuecomment-606423084
 
 
   @akashrn5 '\r' is carriage_return, it commands a printer, or other output system such as the display of a system console, to move the position of the cursor to the first position on the same line. 1 and 2 are before '\r', so you cannot seem them on the screen.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.

GitBox
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3664: [CARBONDATA-3740] Add line separator option to load command to configure the line separator during csv parsing.
URL: https://github.com/apache/carbondata/pull/3664#discussion_r400678851
 
 

 ##########
 File path: hadoop/src/main/java/org/apache/carbondata/hadoop/testutil/StoreCreator.java
 ##########
 @@ -407,6 +407,7 @@ public static void loadData(CarbonLoadModel loadModel, String storeLocation)
     CSVInputFormat.setNumberOfColumns(
         configuration, String.valueOf(loadModel.getCsvHeaderColumns().length));
     CSVInputFormat.setMaxColumns(configuration, "10");
+    CSVInputFormat.setLineSeparator(configuration, loadModel.getLineSeparator());
 
 Review comment:
   If not null, then only set the line separator. Else it may lead to wrong behavior.
   You already did like this in `CommonUtil.scala`
   Follow the same

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
123