[jira] [Updated] (CARBONDATA-3565) Binary to string issue when loading dataframe data in NewRddIterator

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (CARBONDATA-3565) Binary to string issue when loading dataframe data in NewRddIterator

Akash R Nilugal (Jira)

     [ https://issues.apache.org/jira/browse/CARBONDATA-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ChenKai updated CARBONDATA-3565:
--------------------------------
    Description:
* issue
Spark DataFrame(SQL) load complex binary data to a hive table, the data will be broken when reading out. I see in RddIterator, the data will be converted to a string, and then be converted back.

* test case
Binary data can be *DataOutputStream#writeDouble* and so on.

* discussion
I think *CarbonScalaUtil#getString* operation can be removed now. I dig deep into the code in 2016, the code was used in kettle *CsvInput* (commit: 0018756d). But the code has been removed now, I think this converting operation is a little redundant. (UPDATE: The follow-up code GenericParser will use this string-convert logic, should consider here.)

  was:
* issue
Spark DataFrame(SQL) load complex binary data to a hive table, the data will be broken when reading out. I see in RddIterator, the data will be converted to a string, and then be converted back.

* test case
Binary data can be *DataOutputStream#writeDouble* and so on.

* discussion
I think *CarbonScalaUtil#getString* operation can be removed now. I dig deep into the code in 2016, the code was used in kettle *CsvInput* (commit: 0018756d). But the code has been removed now, I think this converting operation is a little redundant.


> Binary to string issue when loading dataframe data in NewRddIterator
> --------------------------------------------------------------------
>
>                 Key: CARBONDATA-3565
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3565
>             Project: CarbonData
>          Issue Type: Bug
>          Components: spark-integration
>    Affects Versions: 1.6.0
>            Reporter: ChenKai
>            Priority: Major
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> * issue
> Spark DataFrame(SQL) load complex binary data to a hive table, the data will be broken when reading out. I see in RddIterator, the data will be converted to a string, and then be converted back.
> * test case
> Binary data can be *DataOutputStream#writeDouble* and so on.
> * discussion
> I think *CarbonScalaUtil#getString* operation can be removed now. I dig deep into the code in 2016, the code was used in kettle *CsvInput* (commit: 0018756d). But the code has been removed now, I think this converting operation is a little redundant. (UPDATE: The follow-up code GenericParser will use this string-convert logic, should consider here.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)