http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSS-Data-loading-improvement-tp11429p13063.html
and key together to simulate cache aware computation. It is only possible
if the sort keys are always starts with fixed keys like dictionary keys. So
> For sorting, I think more optimization we can do, I am currently thinking
> these:
> 1. Do not sort the whole TablePage, only KeyPage is required as the sort
> key
>
> 2. Should find a more memory efficient sorting algorithm than
> System.arraycopy which requires doubling space.
>
> 3. Should try to hold the KeyPage as well as the RowId in a compact data
> structure, it is best if it fits in CPU cache. Modern L3 CPU is larger than
> 8MB. For this, I am thinking to have a 8 bytes encoded format that includes
> RowID and SORT_COLUMNS (partial or full), for example, 2 bytes for RowId,
> remaining 6 bytes for 2 to 3 columns after dictionary encoding.
> a) If we can hold the RowID + whole SORT_COLUMNS in 8 bytes, it will
> be most efficient to leverage CPU cache to do sorting, use in-place update
> approach while sorting. So no extra storage is needed, and the RowID +
> whole SORT_COLUMNS will be sorted.
> b) If we can only hold the RowID + partial SORT_COLUMNS in 8 bytes,
> we can employ strategy like the sorting in Spark Tungsten project. (first
> compare the 8 bytes in cache, if it equals then compare remaining bytes in
> memory)
>
> Regards,
> Jacky
>
> > 在 2017年5月22日,上午10:19,David CaiQiang <
[hidden email]> 写道:
> >
> > As I known, System.arrayCopy of object array is a shallow copy, so I
> think
> > both KeyPage and TablePage maybe have the same performance on
> Arrays.sort.
> >
> >
> > -----
> > Best Regards
> > David Cai
> > --
> > View this message in context:
http://apache-carbondata-dev-m> ailing-list-archive.1130556.n5.nabble.com/DISCUSS-Data-loadi
> ng-improvement-tp11429p13056.html
> > Sent from the Apache CarbonData Dev Mailing List archive mailing list
> archive at Nabble.com.
>
>