[jira] [Updated] (CARBONDATA-431) Analysis compression for numeric datatype compared with Parquet/ORC

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (CARBONDATA-431) Analysis compression for numeric datatype compared with Parquet/ORC

Akash R Nilugal (Jira)

     [ https://issues.apache.org/jira/browse/CARBONDATA-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

suo tong updated CARBONDATA-431:
--------------------------------
    Summary: Analysis compression for numeric datatype compared with Parquet/ORC  (was: Analysis compression for numric datatype compared with Parquet/ORC)

> Analysis compression for numeric datatype compared with Parquet/ORC
> -------------------------------------------------------------------
>
>                 Key: CARBONDATA-431
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-431
>             Project: CarbonData
>          Issue Type: Sub-task
>            Reporter: suo tong
>            Assignee: Jacky Li
>
> For the data type, carbon's string type has better compression ratio, but for numric
> DataType    Text Parquet  Orc Carbon
> decimal  16G  | 11G      | 6G   |    13G
> int          5G   |     1G     |    1G   |    3G
> String  24G  | 22G     |    11G   | 3G   (no dictionary)       -------    high cardinality
> String 30G    | 4G     |    4G   |    1G  -- Dictionary encode            1G  -- Dictionary encode without inverted index            3G  -- No dictionary encode                 (low cardinality)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)