Apache CarbonData Dev Mailing List archive

Re: questions about carbondata

Posted by weijie on Oct 22, 2016; 4:30am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/questions-about-carbondata-tp2150p2232.html

tks for the reply, for 3,I still want to know that whether all the blocklets
of all the blocks store sequence according to the sorted mdk key? if so ,
the global sequence mdk key of the carbon table would behave like what
hbase rowkey does . or the sequence is block local ,the block index file
manage the block level index?

On Fri, Oct 21, 2016 at 5:48 PM, 杰 <[hidden email]> wrote:

> hi,
> 1. correct.
> one carbon file is same as one block, one block has many blocklets as
> well as one file footer which has metadata(btree index) of blocklets.
> one load makes one segment,in one segment has many blocks.
> 2. carbon will sort dim column data in one blocklet, then the row
> sequence will lost, so carbon will store dim column data as will as row id
> together,
> and dim column data sorted and row id sequence changed correspondingly
> , so the matchup(like Array: index => data) is kept.
> when query, carbon will first get the expected dim column data (based
> on filter), then accorfing to matchup to get row id.
> then based on the row id, we can get measure data.
> so the column data is called as inverted index, which means data =>
> index, not index => data.
> 3. yes.
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "weijie tong";<[hidden email]>;
> 发送时间: 2016年10月21日(星期五) 下午4:01
> 收件人: "dev"<[hidden email]>;
>
> 主题: questions about carbondata
>
>
>
> 1,what's the relation ship between these term?
> carbondata file ,block, blocklet ,carbondata file footer ? once we have a
> batch job to load data into a carbondata table, does that mean the table
> file will be composed by different blocks ,and each block is a carbondata
> file which is composed by many blocklets ,and one FileFooter according to
> the carbondata file format ?
>
> 2, how does the column data store as inverted index?
> invert the dim column data to what ? how does inverted index affect a
> query ?
>
> 3. does all the blocklets store sequence according to the sorted mdk key ?
>
> hope someone can give a detail answer.
>