Apache CarbonData Dev Mailing List archive - Re: [Discussion] Please vote and comment for carbon data file format change

Apache CarbonData Dev Mailing List archive

Re: [Discussion] Please vote and comment for carbon data file format change

Posted by kumarvishal09 on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Please-vote-and-comment-for-carbon-data-file-format-change-tp2491p2500.html

Hi Xiaoqiao He,

Please find the attachment.

-Regards

Kumar Vishal

On Tue, Nov 1, 2016 at 9:27 PM, Xiaoqiao He <[hidden email]> wrote:

Hi Kumar Vishal,

I couldn't get Fig. of the file format, could you re-upload them?
Thanks.

Best Regards

On Tue, Nov 1, 2016 at 7:12 PM, Kumar Vishal <[hidden email]>
wrote:

>
> Hello All,
>
> Improving carbon first time query performance
>
> Reason:
> 1. As file system cache is cleared file reading will make it slower to
> read and cache
> 2. In first time query carbon will have to read the footer from file data
> file to form the btree
> 3. Carbon reading more footer data than its required(data chunk)
> 4. There are lots of random seek is happening in carbon as column
> data(data page, rle, inverted index) are not stored together.
>
> Solution:
> 1. Improve block loading time. This can be done by removing data chunk
> from blockletInfo and storing only offset and length of data chunk
> 2. compress presence meta bitset stored for null values for measure column
> using snappy
> 3. Store the metadata and data of a column together and read together this
> reduces random seek and improve IO
>
> For this I am planing to change the carbondata thrift format
>
> *Old format*
>
>
>
> *New format*
>
>
>
> **

>
> Please vote and comment for this new format change
>
> -Regards
> Kumar Vishal
>
>
>
>

kumar vishal