Login  Register

Re: [Feature ]Design Document for Update/Delete support in CarbonData

Posted by kumarvishal09 on Nov 24, 2016; 9:13am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Feature-Design-Document-for-Update-Delete-support-in-CarbonData-tp3043p3159.html

Hi Aniket,

I agree with Vimal opinion, but that use case will be very less.

I have one query for this update and delete feature.
When we will start compaction after each update or delete operation?

-Regards
Kumar Vishal



On Thu, Nov 24, 2016 at 12:05 AM, Aniket Adnaik <[hidden email]>
wrote:

> Hi Vimal,
>
> Thanks for your suggestions.
> For the 1st point, i tend to agree with Manish's comments. But, it's worth
> looking into different ways to optimize the performance.
> I guess, query performance may take priority over update performance.
> Basically, we may need better compaction approach to merge
> delta files into regular carbon files to maintain adequate performance.
> For the 2nd point, CarbonData will support updating multiple rows, but not
> the same row multiple times in a single update operation. It is possible
> that join condition in sub-select of original update statement can result
> into multiple rows from source table for the same row in the target table.
> This is ambiguous condition and common ways to solve this is to error out ,
> or to apply first matching row, or to apply last matching row. CarbonData
> will choose to error out and let user resolve the ambiguity, which a
> safer/standard choice.
>
> Best Regards,
> Aniket
>
> On Wed, Nov 23, 2016 at 4:54 AM, manish gupta <[hidden email]>
> wrote:
>
> > Hi Vimal,
> >
> > I have few queries regarding regarding the 1st suggestion.
> >
> > 1. Dimensions can both be dictionary and no dictionary. If we update the
> > dictionary file then we will have to maintain 2 flows one for dictionary
> > columns and 1 for no dictionary columns. Will that be ok?
> >
> > 2. We write dictionary files in append mode. Updating dictionary files
> will
> > be like completely rewriting the dictionary file which will also modify
> the
> > dictionary metadata and sort index file OR there is some other approach
> > that needs to be followed like maintaining a update delta mapping for
> > dictionary file.
> >
> > Regards
> > Manish Gupta
> >
> > On Wed, Nov 23, 2016 at 10:47 AM, Vimal Das Kammath <
> > [hidden email]> wrote:
> >
> > > Hi Aniket,
> > >
> > > The design looks sound and the documentation is great.
> > > I have few suggestions.
> > >
> > > 1) Measure update vs dimension update : In case of dimension update.
> for
> > > example user wants to change dept1 to dept2 for all users who are under
> > > dept1. Can we just update the dictionary for faster performance?
> > > 2) Update Semantics (one matching record vs multiple matching record):
> I
> > > could not understand this section. Wanted to confirm if we will support
> > one
> > > update statement updating multiple rows.
> > >
> > > -Vimal
> > >
> > > On Tue, Nov 22, 2016 at 2:30 PM, Liang Chen <[hidden email]>
> > > wrote:
> > >
> > > > Hi  Aniket
> > > >
> > > > Thanks you finished the good design documents. A couple of inputs
> from
> > my
> > > > side:
> > > >
> > > > 1.Please add the below mentioned info(Rowid definition etc.) to
> design
> > > > documents also.
> > > > 2.In page6 :"Schema change operation can run in parallel with Update
> or
> > > > Delte operations, but not with another schema change operation" , can
> > you
> > > > explain this item ?
> > > > 3.Please unify the description:  use "CarbonData" to replace
> "Carbon",
> > > > unify the description for "destination table" and "target table".
> > > > 4.The Update operation's delete delta is same with Delete operation's
> > > > delete
> > > > delta?
> > > >
> > > > BTW, it would be much better if you could provide google docs for
> > review
> > > in
> > > > the next time, it is really difficult to give comment based on pdf
> > > > documents
> > > > :)
> > > >
> > > > Regards
> > > > Liang
> > > >
> > > > Aniket Adnaik wrote
> > > > > Hi Sujith,
> > > > >
> > > > > Please see my comments inline.
> > > > >
> > > > > Best Regards,
> > > > > Aniket
> > > > >
> > > > > On Sun, Nov 20, 2016 at 9:11 PM, sujith chacko &lt;
> > > >
> > > > > sujithchacko.2010@
> > > >
> > > > > &gt;
> > > > > wrote:
> > > > >
> > > > >> Hi Aniket,
> > > > >>
> > > > >>       Its a well documented design,  just want to know few points
> > like
> > > > >>
> > > > >> a.  Format of the RowID and its datatype
> > > > >>
> > > > >  AA>> Following format can be used to represent a unique rowed;
> > > > >
> > > > >  [
> > > > > <Segment ID>
> > > > > <Block ID>
> > > > > <Blocklet ID>
> > > > > <Offset in Blocklet>
> > > > > ]
> > > > >  A simple way would be to use String data type and store it as a
> text
> > > > > file.
> > > > > However, more efficient way could be to use Bitsets/Bitmaps as
> > further
> > > > > optimization. Compressed Bitmaps such as Roaring bitmaps can be
> used
> > > for
> > > > > better performance and efficient storage.
> > > > >
> > > > > b.  Impact of this feature in select query since every time query
> > > process
> > > > > has to exclude each deleted records and include corresponding
> updated
> > > > > record, any optimization is considered in tackling the query
> > > performance
> > > > > issue since one of the major highlights of carbon is performance.
> > > > > AA>> Some of the optimizations would be  to cache the deltas to
> avoid
> > > > > recurrent I/O,
> > > > > to store sorted rowids in delete delta for efficient lookup, and
> > > perform
> > > > > regular compaction to minimize the impact on select query
> > performance.
> > > > > Additionally, we may have to explore ways to perform compaction
> > > > > automatically, for example, if more than 25% of rows are read from
> > > > deltas.
> > > > > Please feel free to share if you have any ideas or suggestions.
> > > > >
> > > > > Thanks,
> > > > > Sujith
> > > > >
> > > > > On Nov 20, 2016 9:24 PM, "Aniket Adnaik" &lt;
> > > >
> > > > > aniket.adnaik@
> > > >
> > > > > &gt; wrote:
> > > > >
> > > > >> Hi All,
> > > > >>
> > > > >> Please find a design doc for Update/Delete support in CarbonData.
> > > > >>
> > > > >> https://drive.google.com/file/d/0B71_EuXTdDi8S2dxVjN6Z1RhWlU/view
> ?
> > > > >> usp=sharing
> > > > >>
> > > > >> Best Regards,
> > > > >> Aniket
> > > > >>
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context: http://apache-carbondata-
> > > > mailing-list-archive.1130556.n5.nabble.com/Feature-Design-
> > > > Document-for-Update-Delete-support-in-CarbonData-tp3043p3093.html
> > > > Sent from the Apache CarbonData Mailing List archive mailing list
> > archive
> > > > at Nabble.com.
> > > >
> > >
> >
>
kumar vishal