Geetika Gupta created CARBONDATA-1902:
-----------------------------------------
Summary: Different data is loaded in hive and carbondata
Key: CARBONDATA-1902
URL:
https://issues.apache.org/jira/browse/CARBONDATA-1902 Project: CarbonData
Issue Type: Bug
Components: data-load
Affects Versions: 1.3.0
Environment: spark2.1
Reporter: Geetika Gupta
Attachments: supportBooleanOnlyBoolean.csv
When we create table in carbondata using the following commands:
CREATE TABLE if not exists carbon_table(booleanField BOOLEAN) STORED BY 'carbondata'
LOAD DATA LOCAL INPATH '/path/supportBooleanOnlyBoolean.csv' INTO TABLE carbon_table
OPTIONS('FILEHEADER' = 'booleanField','bad_records_action'='force')
select * from carbon_table
Data loading in hive:
CREATE TABLE if not exists carbon_table_hive(booleanField BOOLEAN)ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOAD DATA LOCAL INPATH '/path/supportBooleanOnlyBoolean.csv'
INTO TABLE carbon_table_hive
select * from carbon_table_hive
When we perform select operation on both the tables, it shows different data
Output in Carbondata
+------------+
|booleanfield|
+------------+
| true|
| true|
| true|
| true|
| false|
| false|
| false|
| false|
| null|
| null|
| null|
+------------+
Output in hive:
+------------+
|booleanField|
+------------+
| true|
| true|
| true|
| null|
| false|
| false|
| false|
| null|
| null|
| null|
| null|
+------------+
This might be due to default quotechar property for load command
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)