Login  Register

Re: [Discussion] Please vote and comment for carbon data file format change

Posted by Jean-Baptiste Onofré on Dec 10, 2016; 8:37am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Please-vote-and-comment-for-carbon-data-file-format-change-tp2491p4050.html

+1

Regards
JB⁣​

On Dec 10, 2016, 09:33, at 09:33, "bill.zhou" <[hidden email]> wrote:

>+1  this modification will help all the scenario
>
>Kumar Vishal wrote
>> ​Hello All,
>>
>> Improving carbon first time query performance
>>
>> Reason:
>> 1. As file system cache is cleared file reading will make it slower
>to
>> read
>> and cache
>> 2. In first time query carbon will have to read the footer from file
>data
>> file to form the btree
>> 3. Carbon reading more footer data than its required(data chunk)
>> 4. There are lots of random seek is happening in carbon as column
>> data(data
>> page, rle, inverted index) are not stored together.
>>
>> Solution:
>> 1. Improve block loading time. This can be done by removing data
>chunk
>> from
>> blockletInfo and storing only offset and length of data chunk
>> 2. compress presence meta bitset stored for null values for measure
>column
>> using snappy
>> 3. Store the metadata and data of a column together and read together
>this
>> reduces random seek and improve IO
>>
>> For this I am planing to change the carbondata thrift format
>>
>> *Old format*
>>
>>
>>
>> *New format*
>>
>>
>>
>> *​*
>>
>> Please vote and comment for this new format change
>>
>> -Regards
>> Kumar Vishal
>
>
>
>
>
>--
>View this message in context:
>http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-Please-vote-and-comment-for-carbon-data-file-format-change-tp2491p4049.html
>Sent from the Apache CarbonData Mailing List archive mailing list
>archive at Nabble.com.