http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSS-Data-loading-improvement-tp11429p13063.html
and key together to simulate cache aware computation. It is only possible
if the sort keys are always starts with fixed keys like dictionary keys. So
remaining key and compare it.
implementation of unsafe sort is also inspired from tungsten sort.
Ravindra.
> For sorting, I think more optimization we can do, I am currently thinking
> these:
> 1. Do not sort the whole TablePage, only KeyPage is required as the sort
> key
>
> 2. Should find a more memory efficient sorting algorithm than
> System.arraycopy which requires doubling space.
>
> 3. Should try to hold the KeyPage as well as the RowId in a compact data
> structure, it is best if it fits in CPU cache. Modern L3 CPU is larger than
> 8MB. For this, I am thinking to have a 8 bytes encoded format that includes
> RowID and SORT_COLUMNS (partial or full), for example, 2 bytes for RowId,
> remaining 6 bytes for 2 to 3 columns after dictionary encoding.
> a) If we can hold the RowID + whole SORT_COLUMNS in 8 bytes, it will
> be most efficient to leverage CPU cache to do sorting, use in-place update
> approach while sorting. So no extra storage is needed, and the RowID +
> whole SORT_COLUMNS will be sorted.
> b) If we can only hold the RowID + partial SORT_COLUMNS in 8 bytes,
> we can employ strategy like the sorting in Spark Tungsten project. (first
> compare the 8 bytes in cache, if it equals then compare remaining bytes in
> memory)
>
> Regards,
> Jacky
>
> > 在 2017年5月22日,上午10:19,David CaiQiang <
[hidden email]> 写道:
> >
> > As I known, System.arrayCopy of object array is a shallow copy, so I
> think
> > both KeyPage and TablePage maybe have the same performance on
> Arrays.sort.
> >
> >
> > -----
> > Best Regards
> > David Cai
> > --
> > View this message in context:
http://apache-carbondata-dev-m> ailing-list-archive.1130556.n5.nabble.com/DISCUSS-Data-loadi
> ng-improvement-tp11429p13056.html
> > Sent from the Apache CarbonData Dev Mailing List archive mailing list
> archive at Nabble.com.
>
>