Apache CarbonData Dev Mailing List archive

Re: Improving Non-dictionary storage & performance.

Posted by ravipesala on Mar 01, 2017; 3:31pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Improving-Non-dictionary-storage-performance-tp8146p8154.html

Hi Vishal,

You are right, thats why we can do no-dictionary only for String datatype.
Please look at my first point. we can always use direct dictionary for
possible data types like short, int, long, double & float for sort_columns.

Regards,
Ravindra.

On 1 March 2017 at 18:18, Kumar Vishal <[hidden email]> wrote:

> Hi Ravi,
> Sorting of data for no dictionary should be based on data type + same for
> filter . Please add this point.
>
> -Regards
> Kumar Vishal
>
> On Wed, Mar 1, 2017 at 8:34 PM, Ravindra Pesala <[hidden email]>
> wrote:
>
> > Hi,
> >
> > In order to make non-dictionary columns storage and performance more
> > efficient, I am suggesting following improvements.
> >
> > 1. Make always SHORT, INT, BIGINT, DOUBLE & FLOAT always direct
> > dictionary.
> > Right now only date and timestamp are direct dictionary columns. We
> can
> > make SHORT, INT, BIGINT, DOUBLE & FLOAT Direct dictionary if these
> columns
> > are included in SORT_COLUMNS
> >
> > 2. Consider delta/value compression while storing direct dictionary
> values.
> > Right now it always uses INT datatype to store direct dictionary values.
> So
> > we can consider value/Delta compression to compact the storage.
> >
> > 3. Use the Separator instead of LV format to store String value in
> > no-dictionary format.
> > Currently String datatypes for non-dictionary colums are stored as
> > LV(length value) format, here we are using Short(2 bytes) as length
> always.
> > In order to keep storage compact we can use separator (0 byte as
> separator)
> > it just takes single byte. And while reading we can traverse through data
> > and get the offsets like we are doing now.
> >
> > 4. Add Range filters for no-dictionary columns.
> > Currently range filters like greater/ less than filters are not
> implemented
> > for no-dictionary columns. So we should implement them to avoid row level
> > filter and improve the performance.
> >
> > Regards,
> > Ravindra.
> >
>

--
Thanks & Regards,
Ravi