|
Dear Dev team,
I have a question several days ago about RLE and DELTA encoding in
Carbon. Thank you for pointing me the source code of the implementation.
I have read through the code, and have the following understanding.
Could you please double confirm whether they are correct? Thanks!
1. RLE encoding only applies to columns with Encoding.DICTIONARY enabled
and has cardinality less than the parameter
CarbonCommonConstants.HIGH_CARDINALITY_VALUE.
I saw that the RLE encoding is applied to data in function
/BlockIndexerStorageForInt.compressDataMyOwnWay, /and is controlled by
/aggKeyBlock/, of which the value is set by /arrangeUniqueBlockType/.
If my understanding is correct, could you please share some reasons you
design the logic like this?
2. DELTA encoding is implemented in
/ValueCompressionUtil.getCompressedValues. /It doesn't do a sequential
DELTA encoding, e.g., for a list of numbers a,b,c..., encode them as a,
b-a, c-b...//Instead, it does a max-delta encoding. e.g., for a,b,c...,
assume the max value is M, encode them as M-a, M-b, M-c.
Could you please also share the thought why you choose to use this
encoding?
Thanks!
Regards,
Hao Jiang
|