Posted by
David CaiQiang on
Mar 08, 2017; 3:39am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Improving-Non-dictionary-storage-performance-tp8146p8412.html
+1
I agree.
About non-dictionary column of sort_columns:
1. sort column data in ColumnChunk2
2. compress column by datatype
string: RLE or snappy (if RLE is not good)
short, int, bigint: Delta and number compressor (ValueCompressor and NumberCompressor)
float, double: Delta and snappy (ValueCompressor and SnappyCompressor)
3. store column by datatype:
string : byte[], use null character separator
short, int, bigint: byte[], use max/min to calculate a fixed length to store delta value
float, double: byte[], uncompressed to float[] or double[]
4. filter column
column level: ExcludeFilterExecuterImpl, IncludeFilterExecuterImpl, RangeFilterExecuter
RangeFilterExecuter of column level should calculate the index range(start and end) of sorted data chunk to get bitset of uncompressed result.
@Ravindra please correct me
Best Regards
David Cai