Posted by
杰 on
Oct 21, 2016; 9:48am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/questions-about-carbondata-tp2150p2164.html
hi,
1. correct.
one carbon file is same as one block, one block has many blocklets as well as one file footer which has metadata(btree index) of blocklets.
one load makes one segment,in one segment has many blocks.
2. carbon will sort dim column data in one blocklet, then the row sequence will lost, so carbon will store dim column data as will as row id together,
and dim column data sorted and row id sequence changed correspondingly , so the matchup(like Array: index => data) is kept.
when query, carbon will first get the expected dim column data (based on filter), then accorfing to matchup to get row id.
then based on the row id, we can get measure data.
so the column data is called as inverted index, which means data => index, not index => data.
3. yes.
------------------ 原始邮件 ------------------
发件人: "weijie tong";<
[hidden email]>;
发送时间: 2016年10月21日(星期五) 下午4:01
收件人: "dev"<
[hidden email]>;
主题: questions about carbondata
1,what's the relation ship between these term?
carbondata file ,block, blocklet ,carbondata file footer ? once we have a
batch job to load data into a carbondata table, does that mean the table
file will be composed by different blocks ,and each block is a carbondata
file which is composed by many blocklets ,and one FileFooter according to
the carbondata file format ?
2, how does the column data store as inverted index?
invert the dim column data to what ? how does inverted index affect a
query ?
3. does all the blocklets store sequence according to the sorted mdk key ?
hope someone can give a detail answer.