Login  Register

Re: [Feature ]Design Document for Update/Delete support in CarbonData

Posted by manishgupta88 on Nov 23, 2016; 12:54pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Feature-Design-Document-for-Update-Delete-support-in-CarbonData-tp3043p3129.html

Hi Vimal,

I have few queries regarding regarding the 1st suggestion.

1. Dimensions can both be dictionary and no dictionary. If we update the
dictionary file then we will have to maintain 2 flows one for dictionary
columns and 1 for no dictionary columns. Will that be ok?

2. We write dictionary files in append mode. Updating dictionary files will
be like completely rewriting the dictionary file which will also modify the
dictionary metadata and sort index file OR there is some other approach
that needs to be followed like maintaining a update delta mapping for
dictionary file.

Regards
Manish Gupta

On Wed, Nov 23, 2016 at 10:47 AM, Vimal Das Kammath <
[hidden email]> wrote:

> Hi Aniket,
>
> The design looks sound and the documentation is great.
> I have few suggestions.
>
> 1) Measure update vs dimension update : In case of dimension update. for
> example user wants to change dept1 to dept2 for all users who are under
> dept1. Can we just update the dictionary for faster performance?
> 2) Update Semantics (one matching record vs multiple matching record): I
> could not understand this section. Wanted to confirm if we will support one
> update statement updating multiple rows.
>
> -Vimal
>
> On Tue, Nov 22, 2016 at 2:30 PM, Liang Chen <[hidden email]>
> wrote:
>
> > Hi  Aniket
> >
> > Thanks you finished the good design documents. A couple of inputs from my
> > side:
> >
> > 1.Please add the below mentioned info(Rowid definition etc.) to design
> > documents also.
> > 2.In page6 :"Schema change operation can run in parallel with Update or
> > Delte operations, but not with another schema change operation" , can you
> > explain this item ?
> > 3.Please unify the description:  use "CarbonData" to replace "Carbon",
> > unify the description for "destination table" and "target table".
> > 4.The Update operation's delete delta is same with Delete operation's
> > delete
> > delta?
> >
> > BTW, it would be much better if you could provide google docs for review
> in
> > the next time, it is really difficult to give comment based on pdf
> > documents
> > :)
> >
> > Regards
> > Liang
> >
> > Aniket Adnaik wrote
> > > Hi Sujith,
> > >
> > > Please see my comments inline.
> > >
> > > Best Regards,
> > > Aniket
> > >
> > > On Sun, Nov 20, 2016 at 9:11 PM, sujith chacko &lt;
> >
> > > sujithchacko.2010@
> >
> > > &gt;
> > > wrote:
> > >
> > >> Hi Aniket,
> > >>
> > >>       Its a well documented design,  just want to know few points like
> > >>
> > >> a.  Format of the RowID and its datatype
> > >>
> > >  AA>> Following format can be used to represent a unique rowed;
> > >
> > >  [
> > > <Segment ID>
> > > <Block ID>
> > > <Blocklet ID>
> > > <Offset in Blocklet>
> > > ]
> > >  A simple way would be to use String data type and store it as a text
> > > file.
> > > However, more efficient way could be to use Bitsets/Bitmaps as further
> > > optimization. Compressed Bitmaps such as Roaring bitmaps can be used
> for
> > > better performance and efficient storage.
> > >
> > > b.  Impact of this feature in select query since every time query
> process
> > > has to exclude each deleted records and include corresponding updated
> > > record, any optimization is considered in tackling the query
> performance
> > > issue since one of the major highlights of carbon is performance.
> > > AA>> Some of the optimizations would be  to cache the deltas to avoid
> > > recurrent I/O,
> > > to store sorted rowids in delete delta for efficient lookup, and
> perform
> > > regular compaction to minimize the impact on select query performance.
> > > Additionally, we may have to explore ways to perform compaction
> > > automatically, for example, if more than 25% of rows are read from
> > deltas.
> > > Please feel free to share if you have any ideas or suggestions.
> > >
> > > Thanks,
> > > Sujith
> > >
> > > On Nov 20, 2016 9:24 PM, "Aniket Adnaik" &lt;
> >
> > > aniket.adnaik@
> >
> > > &gt; wrote:
> > >
> > >> Hi All,
> > >>
> > >> Please find a design doc for Update/Delete support in CarbonData.
> > >>
> > >> https://drive.google.com/file/d/0B71_EuXTdDi8S2dxVjN6Z1RhWlU/view?
> > >> usp=sharing
> > >>
> > >> Best Regards,
> > >> Aniket
> > >>
> >
> >
> >
> >
> >
> > --
> > View this message in context: http://apache-carbondata-
> > mailing-list-archive.1130556.n5.nabble.com/Feature-Design-
> > Document-for-Update-Delete-support-in-CarbonData-tp3043p3093.html
> > Sent from the Apache CarbonData Mailing List archive mailing list archive
> > at Nabble.com.
> >
>