http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSS-Data-loading-improvement-tp11429p13078.html
Yes, and if after dictionary encoding, SORT_COLUMNS can fit in 6 bytes, our approach can be even better, because the 8 bytes data can be put in cache totally, without the remaining portion in memory.
> 在 2017年5月22日,下午5:23,Ravindra Pesala <
[hidden email]> 写道:
>
> Hi,
>
> I think you are referring to tungsten sort, there they tried keep pointer
> and key together to simulate cache aware computation. It is only possible
> if the sort keys are always starts with fixed keys like dictionary keys. So
> basically first encountered few dictionary columns can be kept along with
> pointer and starts sorting, if that is equal then we can go and retrieve
> remaining key and compare it.
> It is simple to implement in our current design as our current
> implementation of unsafe sort is also inspired from tungsten sort.
>
> Regards,
> Ravindra.
>
> On 22 May 2017 at 09:31, Jacky Li <
[hidden email]> wrote:
>
>> For sorting, I think more optimization we can do, I am currently thinking
>> these:
>> 1. Do not sort the whole TablePage, only KeyPage is required as the sort
>> key
>>
>> 2. Should find a more memory efficient sorting algorithm than
>> System.arraycopy which requires doubling space.
>>
>> 3. Should try to hold the KeyPage as well as the RowId in a compact data
>> structure, it is best if it fits in CPU cache. Modern L3 CPU is larger than
>> 8MB. For this, I am thinking to have a 8 bytes encoded format that includes
>> RowID and SORT_COLUMNS (partial or full), for example, 2 bytes for RowId,
>> remaining 6 bytes for 2 to 3 columns after dictionary encoding.
>> a) If we can hold the RowID + whole SORT_COLUMNS in 8 bytes, it will
>> be most efficient to leverage CPU cache to do sorting, use in-place update
>> approach while sorting. So no extra storage is needed, and the RowID +
>> whole SORT_COLUMNS will be sorted.
>> b) If we can only hold the RowID + partial SORT_COLUMNS in 8 bytes,
>> we can employ strategy like the sorting in Spark Tungsten project. (first
>> compare the 8 bytes in cache, if it equals then compare remaining bytes in
>> memory)
>>
>> Regards,
>> Jacky
>>
>>> 在 2017年5月22日,上午10:19,David CaiQiang <
[hidden email]> 写道:
>>>
>>> As I known, System.arrayCopy of object array is a shallow copy, so I
>> think
>>> both KeyPage and TablePage maybe have the same performance on
>> Arrays.sort.
>>>
>>>
>>> -----
>>> Best Regards
>>> David Cai
>>> --
>>> View this message in context:
http://apache-carbondata-dev-m>> ailing-list-archive.1130556.n5.nabble.com/DISCUSS-Data-loadi
>> ng-improvement-tp11429p13056.html
>>> Sent from the Apache CarbonData Dev Mailing List archive mailing list
>> archive at Nabble.com.
>>
>>
>
>
> --
> Thanks & Regards,
> Ravi