Login  Register

Re: Introducing V3 format.

Posted by kumarvishal09 on Mar 01, 2017; 12:23pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Introducing-V3-format-tp7609p8142.html

Hi Bill,
In case of non filter query (full scan query) in V3 format carbon can read
more data in single IO as we can increase number of pages in blocklet, it
will reduce the IO time as number of IO will be less.

-Regards
Kumar Vishal

On Wed, Mar 1, 2017 at 5:39 PM, bill.zhou <[hidden email]> wrote:

> hi Ravindra
>
>  As you description V3 will be benefit for IO scenairo(means more filter),
> what's about for CPU scenario(no filter, full scan with aggregation), is
> there any advantage for that.
>
> Regards
> Bill
>
> ravipesala wrote
> > Problems in current format.
> > 1. IO read is slower since it needs to go for multiple seeks on the file
> > to
> > read column blocklets. Current size of blocklet is 120000, so it needs to
> > read multiple times from file to scan the data on that column.
> > Alternatively we can increase the blocklet size but it suffers for filter
> > queries as it gets big blocklet to filter.
> > 2. Decompression is slower in current format, we are using inverted index
> > for faster filter queries and using NumberCompressor to compress the
> > inverted index in bit wise packing. It becomes slower so we should avoid
> > number compressor. One alternative is to keep blocklet size with in 32000
> > so that inverted index can be written with short, but IO read suffers a
> > lot.
> >
> > To overcome from above 2 issues we are introducing new format V3.
> > Here each blocklet has multiple pages with size 32000, number of pages in
> > blocklet is configurable. Since we keep the page with in short limit so
> no
> > need compress the inverted index here.
> > And maintain the max/min for each page to further prune the filter
> > queries.
> > Read the blocklet with pages at once and keep in offheap memory.
> > During filter first check the max/min range and if it is valid then go
> for
> > decompressing the page to filter further.
> >
> > Please find the attached V3 format thrift file.
> >
> > --
> > Thanks & Regards,
> > Ravi
>
>
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/Introducing-V3-
> format-tp7609p8137.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>
kumar vishal