Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[jira] [Updated] (CARBONDATA-431) Analysis compression for numeric datatype compared with Parquet/ORC

Classic

List

Threaded

1 message

Akash R Nilugal (Jira)

[jira] [Updated] (CARBONDATA-431) Analysis compression for numeric datatype compared with Parquet/ORC

[ https://issues.apache.org/jira/browse/CARBONDATA-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

suo tong updated CARBONDATA-431:
--------------------------------
Summary: Analysis compression for numeric datatype compared with Parquet/ORC (was: Analysis compression for numric datatype compared with Parquet/ORC)

> Analysis compression for numeric datatype compared with Parquet/ORC
> -------------------------------------------------------------------
>
> Key: CARBONDATA-431
> URL: https://issues.apache.org/jira/browse/CARBONDATA-431
> Project: CarbonData
> Issue Type: Sub-task
> Reporter: suo tong
> Assignee: Jacky Li
>
> For the data type, carbon's string type has better compression ratio, but for numric
> DataType Text Parquet Orc Carbon
> decimal 16G | 11G | 6G | 13G
> int 5G | 1G | 1G | 3G
> String 24G | 22G | 11G | 3G （no dictionary） ------- high cardinality
> String 30G | 4G | 4G | 1G -- Dictionary encode 1G -- Dictionary encode without inverted index 3G -- No dictionary encode （low cardinality）

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)