http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Improving-Non-dictionary-storage-performance-tp8146p8154.html
You are right, thats why we can do no-dictionary only for String datatype.
Please look at my first point. we can always use direct dictionary for
possible data types like short, int, long, double & float for sort_columns.
Ravindra.
> Hi Ravi,
> Sorting of data for no dictionary should be based on data type + same for
> filter . Please add this point.
>
> -Regards
> Kumar Vishal
>
> On Wed, Mar 1, 2017 at 8:34 PM, Ravindra Pesala <
[hidden email]>
> wrote:
>
> > Hi,
> >
> > In order to make non-dictionary columns storage and performance more
> > efficient, I am suggesting following improvements.
> >
> > 1. Make always SHORT, INT, BIGINT, DOUBLE & FLOAT always direct
> > dictionary.
> > Right now only date and timestamp are direct dictionary columns. We
> can
> > make SHORT, INT, BIGINT, DOUBLE & FLOAT Direct dictionary if these
> columns
> > are included in SORT_COLUMNS
> >
> > 2. Consider delta/value compression while storing direct dictionary
> values.
> > Right now it always uses INT datatype to store direct dictionary values.
> So
> > we can consider value/Delta compression to compact the storage.
> >
> > 3. Use the Separator instead of LV format to store String value in
> > no-dictionary format.
> > Currently String datatypes for non-dictionary colums are stored as
> > LV(length value) format, here we are using Short(2 bytes) as length
> always.
> > In order to keep storage compact we can use separator (0 byte as
> separator)
> > it just takes single byte. And while reading we can traverse through data
> > and get the offsets like we are doing now.
> >
> > 4. Add Range filters for no-dictionary columns.
> > Currently range filters like greater/ less than filters are not
> implemented
> > for no-dictionary columns. So we should implement them to avoid row level
> > filter and improve the performance.
> >
> > Regards,
> > Ravindra.
> >
>