http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Feature-Design-Document-for-Update-Delete-support-in-CarbonData-tp3043p3063.html
I think in RowID format we should also include partitionID. Currently
partitioning, this format would comply with it.
> Hi Sujith,
>
> Please see my comments inline.
>
> Best Regards,
> Aniket
>
> On Sun, Nov 20, 2016 at 9:11 PM, sujith chacko <
>
[hidden email]>
> wrote:
>
> > Hi Aniket,
> >
> > Its a well documented design, just want to know few points like
> >
> > a. Format of the RowID and its datatype
> >
> AA>> Following format can be used to represent a unique rowed;
>
> [<Segment ID><Block ID><Blocklet ID><Offset in Blocklet>]
> A simple way would be to use String data type and store it as a text file.
> However, more efficient way could be to use Bitsets/Bitmaps as further
> optimization. Compressed Bitmaps such as Roaring bitmaps can be used for
> better performance and efficient storage.
>
> b. Impact of this feature in select query since every time query process
> has to exclude each deleted records and include corresponding updated
> record, any optimization is considered in tackling the query performance
> issue since one of the major highlights of carbon is performance.
> AA>> Some of the optimizations would be to cache the deltas to avoid
> recurrent I/O,
> to store sorted rowids in delete delta for efficient lookup, and perform
> regular compaction to minimize the impact on select query performance.
> Additionally, we may have to explore ways to perform compaction
> automatically, for example, if more than 25% of rows are read from deltas.
> Please feel free to share if you have any ideas or suggestions.
>
> Thanks,
> Sujith
>
> On Nov 20, 2016 9:24 PM, "Aniket Adnaik" <
[hidden email]> wrote:
>
> > Hi All,
> >
> > Please find a design doc for Update/Delete support in CarbonData.
> >
> >
https://drive.google.com/file/d/0B71_EuXTdDi8S2dxVjN6Z1RhWlU/view?
> > usp=sharing
> >
> > Best Regards,
> > Aniket
> >
>