Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[jira] [Updated] (CARBONDATA-431) Analysis compression for numeric datatype compared with Parquet/ORC

Classic

List

Threaded

1 message

Akash R Nilugal (Jira)

[jira] [Updated] (CARBONDATA-431) Analysis compression for numeric datatype compared with Parquet/ORC

[ https://issues.apache.org/jira/browse/CARBONDATA-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jacky Li updated CARBONDATA-431:
--------------------------------
Fix Version/s: 1.0.0-incubating

> Analysis compression for numeric datatype compared with Parquet/ORC
> -------------------------------------------------------------------
>
> Key: CARBONDATA-431
> URL: https://issues.apache.org/jira/browse/CARBONDATA-431
> Project: CarbonData
> Issue Type: Sub-task
> Reporter: suo tong
> Assignee: Ashok Kumar
> Fix For: 1.0.0-incubating
>
> Time Spent: 2h 50m
> Remaining Estimate: 0h
>
> For the data type, carbon's string type has better compression ratio, but for numeric type, orc has the best compression. we should analysis numeric datatype for carbon to get better compression ratio
> DataType Text Parquet Orc Carbon
> decimal 16G | 11G | 6G | 13G
> int 5G | 1G | 1G | 3G
> String 24G | 22G | 11G | 3G （no dictionary） ------- high cardinality
> String 30G | 4G | 4G | 1G -- Dictionary encode 1G -- Dictionary encode without inverted index 3G -- No dictionary encode -----------low cardinality

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)