Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[jira] [Updated] (CARBONDATA-658) Compression is not working for BigInt and Int datatype

Classic

List

Threaded

1 message

Akash R Nilugal (Jira)

[jira] [Updated] (CARBONDATA-658) Compression is not working for BigInt and Int datatype

[ https://issues.apache.org/jira/browse/CARBONDATA-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Geetika Gupta updated CARBONDATA-658:
-------------------------------------
Description:
I tried to load data into a table having bigInt as a column. Firstly I loaded small bigint values to the table and noted down the carbondata file size then I loaded max bigint values to the table and again noted the carbondata file size.

For large bigint values the carbondata file size was 684.25 Kb and for small bigint values it was 684.26 Kb. So I could not figure out whether compression is performed or not.

I tried the same scenario with int datatype as well. For large int values the carbondata file size was 684.24 Kb and for small int values it was 684.26 Kb.

Below are the queries:
For BigInt table:

Create table test(a BigInt, b String) stored by 'carbondata';

LOAD DATA INPATH 'hdfs://localhost:54311/testFiles/100000_LargeBigInt.csv' into table test OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','FILEHEADER'='b,a');

LOAD DATA INPATH 'hdfs://localhost:54311/testFiles/100000_SmallBigInt.csv' into table test OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','FILEHEADER'='b,a');

For Int table:

Create table test(a Int, b String) stored by 'carbondata';

LOAD DATA INPATH 'hdfs://localhost:54311/testFiles/100000_LargeInt.csv' into table test OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','FILEHEADER'='b,a');

LOAD DATA INPATH 'hdfs://localhost:54311/testFiles/100000_SmallInt.csv' into table test OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','FILEHEADER'='b,a');

was:I tried to load data into a table having bigInt as a column. Firstly I loaded small bigint values to the table and noted down the carbondata file size then I loaded max bigint values to the table and again noted the carbondata file size.

Attachment: 100000_SmallInt.csv
100000_LargeInt.csv
100000_SmallBigInt.csv
100000_LargeBigInt.csv
Environment: spark 1.6, 2.0 (was: spark 1.6)
Summary: Compression is not working for BigInt and Int datatype (was: Compression is not working for BigInt and Int)

> Compression is not working for BigInt and Int datatype
> ------------------------------------------------------
>
> Key: CARBONDATA-658
> URL: https://issues.apache.org/jira/browse/CARBONDATA-658
> Project: CarbonData
> Issue Type: Bug
> Components: data-load
> Affects Versions: 1.0.0-incubating
> Environment: spark 1.6, 2.0
> Reporter: Geetika Gupta
> Attachments: 100000_LargeBigInt.csv, 100000_LargeInt.csv, 100000_SmallBigInt.csv, 100000_SmallInt.csv
>
>
> I tried to load data into a table having bigInt as a column. Firstly I loaded small bigint values to the table and noted down the carbondata file size then I loaded max bigint values to the table and again noted the carbondata file size.
> For large bigint values the carbondata file size was 684.25 Kb and for small bigint values it was 684.26 Kb. So I could not figure out whether compression is performed or not.
> I tried the same scenario with int datatype as well. For large int values the carbondata file size was 684.24 Kb and for small int values it was 684.26 Kb.
> Below are the queries:
> For BigInt table:
> Create table test(a BigInt, b String) stored by 'carbondata';
> LOAD DATA INPATH 'hdfs://localhost:54311/testFiles/100000_LargeBigInt.csv' into table test OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','FILEHEADER'='b,a');
> LOAD DATA INPATH 'hdfs://localhost:54311/testFiles/100000_SmallBigInt.csv' into table test OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','FILEHEADER'='b,a');
> For Int table:
> Create table test(a Int, b String) stored by 'carbondata';
> LOAD DATA INPATH 'hdfs://localhost:54311/testFiles/100000_LargeInt.csv' into table test OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','FILEHEADER'='b,a');
> LOAD DATA INPATH 'hdfs://localhost:54311/testFiles/100000_SmallInt.csv' into table test OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','FILEHEADER'='b,a');

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)