Login  Register

Re: [Feature ]Design Document for Update/Delete support in CarbonData

Posted by Aniket Adnaik on Nov 21, 2016; 5:22pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Feature-Design-Document-for-Update-Delete-support-in-CarbonData-tp3043p3069.html

Hi Manish,

Yes, I agree, we'll have to include partition id if we start supporting
partitioning in future.  There might be other options, such as making
segment id unique enough to include partition id as a part of it.
On a side note - we may need transaction id as well if we start supporting
transaction semantics in future.

Best Regards,
Aniket

On Mon, Nov 21, 2016 at 4:00 AM, manish gupta <[hidden email]>
wrote:

> Hi Aniket,
>
> I think in RowID format we should also include partitionID. Currently
> carbon is not supporting partition but going forward when we support
> partitioning, this format would comply with it.
>
>  [<Partition ID><Segment ID><Block ID><Blocklet ID><Offset in Blocklet>]
>
> Regards
> Manish Gupta
>
> On Mon, Nov 21, 2016 at 1:07 PM, Aniket Adnaik <[hidden email]>
> wrote:
>
> > Hi Sujith,
> >
> > Please see my comments inline.
> >
> > Best Regards,
> > Aniket
> >
> > On Sun, Nov 20, 2016 at 9:11 PM, sujith chacko <
> > [hidden email]>
> > wrote:
> >
> > > Hi Aniket,
> > >
> > >       Its a well documented design,  just want to know few points like
> > >
> > > a.  Format of the RowID and its datatype
> > >
> >  AA>> Following format can be used to represent a unique rowed;
> >
> >  [<Segment ID><Block ID><Blocklet ID><Offset in Blocklet>]
> >  A simple way would be to use String data type and store it as a text
> file.
> > However, more efficient way could be to use Bitsets/Bitmaps as further
> > optimization. Compressed Bitmaps such as Roaring bitmaps can be used for
> > better performance and efficient storage.
> >
> > b.  Impact of this feature in select query since every time query process
> > has to exclude each deleted records and include corresponding updated
> > record, any optimization is considered in tackling the query performance
> > issue since one of the major highlights of carbon is performance.
> > AA>> Some of the optimizations would be  to cache the deltas to avoid
> > recurrent I/O,
> > to store sorted rowids in delete delta for efficient lookup, and perform
> > regular compaction to minimize the impact on select query performance.
> > Additionally, we may have to explore ways to perform compaction
> > automatically, for example, if more than 25% of rows are read from
> deltas.
> > Please feel free to share if you have any ideas or suggestions.
> >
> > Thanks,
> > Sujith
> >
> > On Nov 20, 2016 9:24 PM, "Aniket Adnaik" <[hidden email]>
> wrote:
> >
> > > Hi All,
> > >
> > > Please find a design doc for Update/Delete support in CarbonData.
> > >
> > > https://drive.google.com/file/d/0B71_EuXTdDi8S2dxVjN6Z1RhWlU/view?
> > > usp=sharing
> > >
> > > Best Regards,
> > > Aniket
> > >
> >
>