[
https://issues.apache.org/jira/browse/CARBONDATA-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
suo tong updated CARBONDATA-431:
--------------------------------
Description:
For the data type, carbon's string type has better compression ratio, but for numric
DataType Text Parquet Orc Carbon
decimal 16G | 11G | 6G | 13G
int 5G | 1G | 1G | 3G
String 24G | 22G | 11G | 3G (no dictionary) ------- high cardinality
String 30G | 4G | 4G | 1G -- Dictionary encode 1G -- Dictionary encode without inverted index 3G -- No dictionary encode (low cardinality)
> Analysis compression for numric datatype compared with Parquet/ORC
> ------------------------------------------------------------------
>
> Key: CARBONDATA-431
> URL:
https://issues.apache.org/jira/browse/CARBONDATA-431> Project: CarbonData
> Issue Type: Sub-task
> Reporter: suo tong
> Assignee: Jacky Li
>
> For the data type, carbon's string type has better compression ratio, but for numric
> DataType Text Parquet Orc Carbon
> decimal 16G | 11G | 6G | 13G
> int 5G | 1G | 1G | 3G
> String 24G | 22G | 11G | 3G (no dictionary) ------- high cardinality
> String 30G | 4G | 4G | 1G -- Dictionary encode 1G -- Dictionary encode without inverted index 3G -- No dictionary encode (low cardinality)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)