Apache CarbonData Dev Mailing List archive

Re: Improving Non-dictionary storage & performance.

Posted by David CaiQiang on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Improving-Non-dictionary-storage-performance-tp8146p8412.html

+1

I agree.

About non-dictionary column of sort_columns:
1. sort column data in ColumnChunk2

2. compress column by datatype
string: RLE or snappy (if RLE is not good)
short, int, bigint: Delta and number compressor (ValueCompressor and NumberCompressor)
float, double: Delta and snappy (ValueCompressor and SnappyCompressor)

3. store column by datatype:
string : byte[], use null character separator
short, int, bigint: byte[], use max/min to calculate a fixed length to store delta value
float, double: byte[], uncompressed to float[] or double[]

4. filter column
column level: ExcludeFilterExecuterImpl, IncludeFilterExecuterImpl, RangeFilterExecuter
RangeFilterExecuter of column level should calculate the index range(start and end) of sorted data chunk to get bitset of uncompressed result.

@Ravindra please correct me

Best Regards
David Cai