Login  Register

Re: [Feature ]Design Document for Update/Delete support in CarbonData

Posted by Aniket Adnaik on Nov 23, 2016; 12:22am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Feature-Design-Document-for-Update-Delete-support-in-CarbonData-tp3043p3113.html

Hi Liang,

Please see my comments inline.

Best Regards,
Aniket

On Tue, Nov 22, 2016 at 1:00 AM, Liang Chen <[hidden email]> wrote:

> Hi  Aniket
>
> Thanks you finished the good design documents. A couple of inputs from my
> side:
>
> 1.Please add the below mentioned info(Rowid definition etc.) to design
> documents also.
>
AA>> yes, its good to have this info into the document.

> 2.In page6 :"Schema change operation can run in parallel with Update or
> Delte operations, but not with another schema change operation" , can you
> explain this item ?
>
AA>>  synchronization for schema change operations like db name change or
properties change are handled separately,
allowing update or delete operation to work in parallel with schema change
operation.

> 3.Please unify the description:  use "CarbonData" to replace "Carbon",
> unify the description for "destination table" and "target table".
>
AA> yes, I will update the document accordingly.

> 4.The Update operation's delete delta is same with Delete operation's
> delete
> delta?
>
AA>> yes, delete delta is nothing but the rowids of qualifying rows that
needs to be deleted.

>
> BTW, it would be much better if you could provide google docs for review in
> the next time, it is really difficult to give comment based on pdf
> documents
> :)
> AA>> Yes I agree :). Unfortunately, google docs totally messed up the
> diagrams when I first tried to save it into google docs.

, I was unable to solve that issue so uploaded as pdf.
>



> Regards
> Liang
>
> Aniket Adnaik wrote
> > Hi Sujith,
> >
> > Please see my comments inline.
> >
> > Best Regards,
> > Aniket
> >
> > On Sun, Nov 20, 2016 at 9:11 PM, sujith chacko &lt;
>
> > sujithchacko.2010@
>
> > &gt;
> > wrote:
> >
> >> Hi Aniket,
> >>
> >>       Its a well documented design,  just want to know few points like
> >>
> >> a.  Format of the RowID and its datatype
> >>
> >  AA>> Following format can be used to represent a unique rowed;
> >
> >  [
> > <Segment ID>
> > <Block ID>
> > <Blocklet ID>
> > <Offset in Blocklet>
> > ]
> >  A simple way would be to use String data type and store it as a text
> > file.
> > However, more efficient way could be to use Bitsets/Bitmaps as further
> > optimization. Compressed Bitmaps such as Roaring bitmaps can be used for
> > better performance and efficient storage.
> >
> > b.  Impact of this feature in select query since every time query process
> > has to exclude each deleted records and include corresponding updated
> > record, any optimization is considered in tackling the query performance
> > issue since one of the major highlights of carbon is performance.
> > AA>> Some of the optimizations would be  to cache the deltas to avoid
> > recurrent I/O,
> > to store sorted rowids in delete delta for efficient lookup, and perform
> > regular compaction to minimize the impact on select query performance.
> > Additionally, we may have to explore ways to perform compaction
> > automatically, for example, if more than 25% of rows are read from
> deltas.
> > Please feel free to share if you have any ideas or suggestions.
> >
> > Thanks,
> > Sujith
> >
> > On Nov 20, 2016 9:24 PM, "Aniket Adnaik" &lt;
>
> > aniket.adnaik@
>
> > &gt; wrote:
> >
> >> Hi All,
> >>
> >> Please find a design doc for Update/Delete support in CarbonData.
> >>
> >> https://drive.google.com/file/d/0B71_EuXTdDi8S2dxVjN6Z1RhWlU/view?
> >> usp=sharing
> >>
> >> Best Regards,
> >> Aniket
> >>
>
>
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/Feature-Design-
> Document-for-Update-Delete-support-in-CarbonData-tp3043p3093.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>