This post was updated on .
Hi Community,
I have created a table in hive and same in CarbonData, In my CSV there is whitespace after the data which is working fine in hive table whereas it shows null in CarbonData table. In Hive 1: Table created in Hive create table my_user(id int, name string, address1 string, address2 string,salary double, incentive int)ROW FORMAT DELIMITED FIELDS TERMINATED BY ","; 2: Load Data in table LOAD DATA LOCAL INPATH '/home/vinod/Downloads/my_user2.csv' OVERWRITE INTO TABLE my_user; 3: Select Query on hive 0: jdbc:hive2://localhost:10000> select * from my_user; +-----+-----------+-----------+-----------+----------+------------+--+ | id | name | address1 | address2 | salary | incentive | +-----+-----------+-----------+-----------+----------+------------+--+ | 1 | 'katy' | 'london' | 'UK' | 5000.0 | 5000 | | 2 | 'John' | 'goa' | 'goa' | 12000.0 | 500 | | 3 | 'Rachel' | 'goa' | 'goa' | 10000.0 | 1000 | +-----+-----------+-----------+-----------+----------+------------+--+ 3 rows selected (0.122 seconds) In Carbon 1: Create Table in Carbon create table my_user_c(id int, name string, address1 string, address2 string,salary double, incentive int) stored by 'carbondata'; 2: Load Data in table load data inpath 'hdfs://localhost:54310/my_user2.csv' into table my_user_c; 3: Select Query on carbon 0: jdbc:hive2://localhost:10000> select * from my_user_c; +-----+-----------+-----------+-----------+----------+------------+--+ | id | name | address1 | address2 | salary | incentive | +-----+-----------+-----------+-----------+----------+------------+--+ | 2 | 'John' | 'goa' | 'goa' | 12000.0 | 500 | | 3 | 'Rachel' | 'goa' | 'goa' | 10000.0 | NULL | | 1 | 'katy' | 'london' | 'UK' | 5000.0 | 5000 | +-----+-----------+-----------+-----------+----------+------------+--+ 3 rows selected (0.193 seconds) it should display 1000 instead of NULL value in CarbonData. |
I suggest to trim white space before converting a string value to a numeric value.
Best Regards
David Cai |
Hi Vin,
Value is getting displayed as null because data is not getting trimmed. By default carbondata will not modify any data. But there is option to verify your data. 1. While loading the data set the property " BAD_RECORDS_LOGGER_ENABLE=true". This will create a bad record logger where in you can see what all records are invalid. 2. Secondly by default trimming data is disabled in the code. parserSettings.setIgnoreLeadingWhitespaces(false); parserSettings.setIgnoreTrailingWhitespaces(false); We can provide an option during data load to trim whitespaces and keep the default value as false to keep the current behavior intact. Example load data inpath 'D:/my_user2.csv' into table my_user_c OPTIONS('TRIM_WHITESPACE'='TRUE') @Dev team please share your inputs for the 2nd option. Regards Manish Gupta On Wed, May 17, 2017 at 7:47 AM, David Cai <[hidden email]> wrote: > I suggest to trim white space before converting a string value to a numeric > value. > > > > ----- > Best Regards > David Cai > -- > View this message in context: http://apache-carbondata-dev- > mailing-list-archive.1130556.n5.nabble.com/White-Space- > does-not-trim-in-CarbonData-tp12736p12751.html > Sent from the Apache CarbonData Dev Mailing List archive mailing list > archive at Nabble.com. > |
Hi,
Generally speaking, I think it is preferable to make system more tolerant to input data so that it is simpler to use from user point of view. And I think it is safe to align with other format like ORC and parquet. Regards, Jacky > 在 2017年5月17日,下午12:30,manish gupta <[hidden email]> 写道: > > Hi Vin, > > Value is getting displayed as null because data is not getting trimmed. By > default carbondata will not modify any data. But there is option to verify > your data. > > 1. While loading the data set the property " > BAD_RECORDS_LOGGER_ENABLE=true". This will create a bad record logger where > in you can see what all records are invalid. > > 2. Secondly by default trimming data is disabled in the code. > > parserSettings.setIgnoreLeadingWhitespaces(false); > parserSettings.setIgnoreTrailingWhitespaces(false); > > > We can provide an option during data load to trim whitespaces and keep the > default value as false to keep the current behavior intact. Example > load data inpath 'D:/my_user2.csv' into table my_user_c > OPTIONS('TRIM_WHITESPACE'='TRUE') > > @Dev team please share your inputs for the 2nd option. > > Regards > Manish Gupta > > > On Wed, May 17, 2017 at 7:47 AM, David Cai <[hidden email]> wrote: > >> I suggest to trim white space before converting a string value to a numeric >> value. >> >> >> >> ----- >> Best Regards >> David Cai >> -- >> View this message in context: http://apache-carbondata-dev- >> mailing-list-archive.1130556.n5.nabble.com/White-Space- >> does-not-trim-in-CarbonData-tp12736p12751.html >> Sent from the Apache CarbonData Dev Mailing List archive mailing list >> archive at Nabble.com. >> |
Free forum by Nabble | Edit this page |