[
https://issues.apache.org/jira/browse/CARBONDATA-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
suo tong updated CARBONDATA-431:
--------------------------------
Summary: Analysis compression for numeric datatype compared with Parquet/ORC (was: Analysis compression for numric datatype compared with Parquet/ORC)
> Analysis compression for numeric datatype compared with Parquet/ORC
> -------------------------------------------------------------------
>
> Key: CARBONDATA-431
> URL:
https://issues.apache.org/jira/browse/CARBONDATA-431> Project: CarbonData
> Issue Type: Sub-task
> Reporter: suo tong
> Assignee: Jacky Li
>
> For the data type, carbon's string type has better compression ratio, but for numric
> DataType Text Parquet Orc Carbon
> decimal 16G | 11G | 6G | 13G
> int 5G | 1G | 1G | 3G
> String 24G | 22G | 11G | 3G (no dictionary) ------- high cardinality
> String 30G | 4G | 4G | 1G -- Dictionary encode 1G -- Dictionary encode without inverted index 3G -- No dictionary encode (low cardinality)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)