Apache CarbonData Dev Mailing List archive - Re: [Discussion] Please vote and comment for carbon data file format change

Apache CarbonData Dev Mailing List archive

Re: [Discussion] Please vote and comment for carbon data file format change

Posted by kumarvishal09 on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Please-vote-and-comment-for-carbon-data-file-format-change-tp2491p3390.html

Hi All,
Please find the JIRA issue which I have raised for above discussion.

https://issues.apache.org/jira/browse/CARBONDATA-458

-Regards
Kumar Vishal

On Tue, Nov 29, 2016 at 7:14 PM, Kumar Vishal <[hidden email]>
wrote:

> Hi Jihong Ma,
> Please find the attachment.
>
> -Regards
> Kumar Vishal
>
> On Fri, Nov 4, 2016 at 12:16 AM, Jihong Ma <[hidden email]> wrote:
>
>> Hi Kumar,
>>
>> Please place the proposed format changes in attachment or attach to the
>> associated JIRA, I would like to take a look.
>>
>> Thanks!
>>
>> Jihong
>>
>> -----Original Message-----
>> From: Jacky Li [mailto:[hidden email]]
>> Sent: Thursday, November 03, 2016 7:54 AM
>> To: [hidden email]
>> Subject: Re: [Discussion] Please vote and comment for carbon data file
>> format change
>>
>> The proposed change is reasonable, +1.
>> But is there a plan to make the reader backward compatible with the old
>> format? So the impact to the current deployment is minimum.
>>
>> Regards,
>> Jacky
>>
>> > 在 2016年11月2日，上午12:38，Kumar Vishal <[hidden email]> 写道：
>> >
>> > Hi Xiaoqiao He,
>> >
>> > Please find the attachment.
>> >
>> > -Regards
>> > Kumar Vishal
>> >
>> > On Tue, Nov 1, 2016 at 9:27 PM, Xiaoqiao He <[hidden email]
>> <mailto:[hidden email]>> wrote:
>> > Hi Kumar Vishal,
>> >
>> > I couldn't get Fig. of the file format, could you re-upload them?
>> > Thanks.
>> >
>> > Best Regards
>> >
>> > On Tue, Nov 1, 2016 at 7:12 PM, Kumar Vishal <[hidden email]
>> <mailto:[hidden email]>>
>> > wrote:
>> >
>> > >
>> > > Hello All,
>> > >
>> > > Improving carbon first time query performance
>> > >
>> > > Reason:
>> > > 1. As file system cache is cleared file reading will make it slower to
>> > > read and cache
>> > > 2. In first time query carbon will have to read the footer from file
>> data
>> > > file to form the btree
>> > > 3. Carbon reading more footer data than its required(data chunk)
>> > > 4. There are lots of random seek is happening in carbon as column
>> > > data(data page, rle, inverted index) are not stored together.
>> > >
>> > > Solution:
>> > > 1. Improve block loading time. This can be done by removing data chunk
>> > > from blockletInfo and storing only offset and length of data chunk
>> > > 2. compress presence meta bitset stored for null values for measure
>> column
>> > > using snappy
>> > > 3. Store the metadata and data of a column together and read together
>> this
>> > > reduces random seek and improve IO
>> > >
>> > > For this I am planing to change the carbondata thrift format
>> > >
>> > > *Old format*
>> > >
>> > >
>> > >
>> > > *New format*
>> > >
>> > >
>> > >
>> > > **
>> > >
>> > > Please vote and comment for this new format change
>> > >
>> > > -Regards
>> > > Kumar Vishal
>> > >
>> > >
>> > >
>> > >
>> >
>>
>>
>

kumar vishal