Login  Register

Re: [Feature ]Design Document for Update/Delete support in CarbonData

Posted by Vimal Das Kammath on Nov 23, 2016; 5:17am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Feature-Design-Document-for-Update-Delete-support-in-CarbonData-tp3043p3116.html

Hi Aniket,

The design looks sound and the documentation is great.
I have few suggestions.

1) Measure update vs dimension update : In case of dimension update. for
example user wants to change dept1 to dept2 for all users who are under
dept1. Can we just update the dictionary for faster performance?
2) Update Semantics (one matching record vs multiple matching record): I
could not understand this section. Wanted to confirm if we will support one
update statement updating multiple rows.

-Vimal

On Tue, Nov 22, 2016 at 2:30 PM, Liang Chen <[hidden email]> wrote:

> Hi  Aniket
>
> Thanks you finished the good design documents. A couple of inputs from my
> side:
>
> 1.Please add the below mentioned info(Rowid definition etc.) to design
> documents also.
> 2.In page6 :"Schema change operation can run in parallel with Update or
> Delte operations, but not with another schema change operation" , can you
> explain this item ?
> 3.Please unify the description:  use "CarbonData" to replace "Carbon",
> unify the description for "destination table" and "target table".
> 4.The Update operation's delete delta is same with Delete operation's
> delete
> delta?
>
> BTW, it would be much better if you could provide google docs for review in
> the next time, it is really difficult to give comment based on pdf
> documents
> :)
>
> Regards
> Liang
>
> Aniket Adnaik wrote
> > Hi Sujith,
> >
> > Please see my comments inline.
> >
> > Best Regards,
> > Aniket
> >
> > On Sun, Nov 20, 2016 at 9:11 PM, sujith chacko &lt;
>
> > sujithchacko.2010@
>
> > &gt;
> > wrote:
> >
> >> Hi Aniket,
> >>
> >>       Its a well documented design,  just want to know few points like
> >>
> >> a.  Format of the RowID and its datatype
> >>
> >  AA>> Following format can be used to represent a unique rowed;
> >
> >  [
> > <Segment ID>
> > <Block ID>
> > <Blocklet ID>
> > <Offset in Blocklet>
> > ]
> >  A simple way would be to use String data type and store it as a text
> > file.
> > However, more efficient way could be to use Bitsets/Bitmaps as further
> > optimization. Compressed Bitmaps such as Roaring bitmaps can be used for
> > better performance and efficient storage.
> >
> > b.  Impact of this feature in select query since every time query process
> > has to exclude each deleted records and include corresponding updated
> > record, any optimization is considered in tackling the query performance
> > issue since one of the major highlights of carbon is performance.
> > AA>> Some of the optimizations would be  to cache the deltas to avoid
> > recurrent I/O,
> > to store sorted rowids in delete delta for efficient lookup, and perform
> > regular compaction to minimize the impact on select query performance.
> > Additionally, we may have to explore ways to perform compaction
> > automatically, for example, if more than 25% of rows are read from
> deltas.
> > Please feel free to share if you have any ideas or suggestions.
> >
> > Thanks,
> > Sujith
> >
> > On Nov 20, 2016 9:24 PM, "Aniket Adnaik" &lt;
>
> > aniket.adnaik@
>
> > &gt; wrote:
> >
> >> Hi All,
> >>
> >> Please find a design doc for Update/Delete support in CarbonData.
> >>
> >> https://drive.google.com/file/d/0B71_EuXTdDi8S2dxVjN6Z1RhWlU/view?
> >> usp=sharing
> >>
> >> Best Regards,
> >> Aniket
> >>
>
>
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/Feature-Design-
> Document-for-Update-Delete-support-in-CarbonData-tp3043p3093.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>