question about the order between original values and its encoded values

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

question about the order between original values and its encoded values

Ma Yun 马云
Hi dev team,

One question about the dictionary encode,
As you know, the original values of a dimension column will be encoded as integer and stored to carbon file ordered by the encoded values.
I have done some test of order by dimension query in my local machine. I changed some code to use the encoded values to sort first, then decode to original values.
The query results are correct. It seems the encoded values has the same order of the original values.
But in the carbondata it always decode to original value first, then  order by the  original values.

Could you help to tell me which scenarios has the different order between the original values and the encoded values?
BTW is there any document to explain the dictionary encode algorithm?

Thanks

Ma, yun
邮件免责申明----- 该电子邮件中的信息是保密的,除收件人外任何人无权访问此电子邮件。 如果您不是收件人,公开、复制、分发或基于此封邮件的任何行动,都是禁止的,并可能是违法的。该邮件包含的任何意见与建议均应遵循上汽集团关于信息传递与保密的制度或规定。除经上汽集团信函以正式书面方式确认外,任何相关的内容或信息不得作为正式依据。 Email Disclaimer----- The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Any opinions or advice contained in this email are subject to the terms and conditions expressed in the governing SAICMOTOR client engagement letter and should not be relied upon unless they are confirmed in writing on SAICMOTOR's letterhead.
Reply | Threaded
Open this post in threaded view
|

Re: question about the order between original values and its encoded values

ravipesala
Hi,
 Yes, it works because we are sorting the column values before assigning
dictionary values to it. So it can work only if you have loaded the data
only once( it means there is no incremental load). If you do incremental
load and some more dictionary values are added to store then there is no
guarantee that you get sorted result on encoded data.

Regards,
Ravindra.

On 16 February 2017 at 15:46, Ma Yun 马云 <[hidden email]> wrote:

> Hi dev team,
>
> One question about the dictionary encode,
> As you know, the original values of a dimension column will be encoded as
> integer and stored to carbon file ordered by the encoded values.
> I have done some test of order by dimension query in my local machine. I
> changed some code to use the encoded values to sort first, then decode to
> original values.
> The query results are correct. It seems the encoded values has the same
> order of the original values.
> But in the carbondata it always decode to original value first, then
> order by the  original values.
>
> Could you help to tell me which scenarios has the different order between
> the original values and the encoded values?
> BTW is there any document to explain the dictionary encode algorithm?
>
> Thanks
>
> Ma, yun
> 邮件免责申明----- 该电子邮件中的信息是保密的,除收件人外任何人无权访问此电子邮件。
> 如果您不是收件人,公开、复制、分发或基于此封邮件的任何行动,都是禁止的,并可能是违法的。该邮件包含的任何意见与建议均应遵循上汽集团关于信息传递与保密
> 的制度或规定。除经上汽集团信函以正式书面方式确认外,任何相关的内容或信息不得作为正式依据。 Email Disclaimer----- The
> information in this email is confidential and may be legally privileged. It
> is intended solely for the addressee. Access to this email by anyone else
> is unauthorized. If you are not the intended recipient, any disclosure,
> copying, distribution or any action taken or omitted to be taken in
> reliance on it, is prohibited and may be unlawful. Any opinions or advice
> contained in this email are subject to the terms and conditions expressed
> in the governing SAICMOTOR client engagement letter and should not be
> relied upon unless they are confirmed in writing on SAICMOTOR's letterhead.
>



--
Thanks & Regards,
Ravi