http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Feature-Design-Document-for-Update-Delete-support-in-CarbonData-tp3043p3133.html
For the 1st point, i tend to agree with Manish's comments. But, it's worth
the same row multiple times in a single update operation. It is possible
or to apply first matching row, or to apply last matching row. CarbonData
> Hi Vimal,
>
> I have few queries regarding regarding the 1st suggestion.
>
> 1. Dimensions can both be dictionary and no dictionary. If we update the
> dictionary file then we will have to maintain 2 flows one for dictionary
> columns and 1 for no dictionary columns. Will that be ok?
>
> 2. We write dictionary files in append mode. Updating dictionary files will
> be like completely rewriting the dictionary file which will also modify the
> dictionary metadata and sort index file OR there is some other approach
> that needs to be followed like maintaining a update delta mapping for
> dictionary file.
>
> Regards
> Manish Gupta
>
> On Wed, Nov 23, 2016 at 10:47 AM, Vimal Das Kammath <
>
[hidden email]> wrote:
>
> > Hi Aniket,
> >
> > The design looks sound and the documentation is great.
> > I have few suggestions.
> >
> > 1) Measure update vs dimension update : In case of dimension update. for
> > example user wants to change dept1 to dept2 for all users who are under
> > dept1. Can we just update the dictionary for faster performance?
> > 2) Update Semantics (one matching record vs multiple matching record): I
> > could not understand this section. Wanted to confirm if we will support
> one
> > update statement updating multiple rows.
> >
> > -Vimal
> >
> > On Tue, Nov 22, 2016 at 2:30 PM, Liang Chen <
[hidden email]>
> > wrote:
> >
> > > Hi Aniket
> > >
> > > Thanks you finished the good design documents. A couple of inputs from
> my
> > > side:
> > >
> > > 1.Please add the below mentioned info(Rowid definition etc.) to design
> > > documents also.
> > > 2.In page6 :"Schema change operation can run in parallel with Update or
> > > Delte operations, but not with another schema change operation" , can
> you
> > > explain this item ?
> > > 3.Please unify the description: use "CarbonData" to replace "Carbon",
> > > unify the description for "destination table" and "target table".
> > > 4.The Update operation's delete delta is same with Delete operation's
> > > delete
> > > delta?
> > >
> > > BTW, it would be much better if you could provide google docs for
> review
> > in
> > > the next time, it is really difficult to give comment based on pdf
> > > documents
> > > :)
> > >
> > > Regards
> > > Liang
> > >
> > > Aniket Adnaik wrote
> > > > Hi Sujith,
> > > >
> > > > Please see my comments inline.
> > > >
> > > > Best Regards,
> > > > Aniket
> > > >
> > > > On Sun, Nov 20, 2016 at 9:11 PM, sujith chacko <
> > >
> > > > sujithchacko.2010@
> > >
> > > > >
> > > > wrote:
> > > >
> > > >> Hi Aniket,
> > > >>
> > > >> Its a well documented design, just want to know few points
> like
> > > >>
> > > >> a. Format of the RowID and its datatype
> > > >>
> > > > AA>> Following format can be used to represent a unique rowed;
> > > >
> > > > [
> > > > <Segment ID>
> > > > <Block ID>
> > > > <Blocklet ID>
> > > > <Offset in Blocklet>
> > > > ]
> > > > A simple way would be to use String data type and store it as a text
> > > > file.
> > > > However, more efficient way could be to use Bitsets/Bitmaps as
> further
> > > > optimization. Compressed Bitmaps such as Roaring bitmaps can be used
> > for
> > > > better performance and efficient storage.
> > > >
> > > > b. Impact of this feature in select query since every time query
> > process
> > > > has to exclude each deleted records and include corresponding updated
> > > > record, any optimization is considered in tackling the query
> > performance
> > > > issue since one of the major highlights of carbon is performance.
> > > > AA>> Some of the optimizations would be to cache the deltas to avoid
> > > > recurrent I/O,
> > > > to store sorted rowids in delete delta for efficient lookup, and
> > perform
> > > > regular compaction to minimize the impact on select query
> performance.
> > > > Additionally, we may have to explore ways to perform compaction
> > > > automatically, for example, if more than 25% of rows are read from
> > > deltas.
> > > > Please feel free to share if you have any ideas or suggestions.
> > > >
> > > > Thanks,
> > > > Sujith
> > > >
> > > > On Nov 20, 2016 9:24 PM, "Aniket Adnaik" <
> > >
> > > > aniket.adnaik@
> > >
> > > > > wrote:
> > > >
> > > >> Hi All,
> > > >>
> > > >> Please find a design doc for Update/Delete support in CarbonData.
> > > >>
> > > >>
https://drive.google.com/file/d/0B71_EuXTdDi8S2dxVjN6Z1RhWlU/view?
> > > >> usp=sharing
> > > >>
> > > >> Best Regards,
> > > >> Aniket
> > > >>
> > >
> > >
> > >
> > >
> > >
> > > --
> > > View this message in context:
http://apache-carbondata-> > > mailing-list-archive.1130556.n5.nabble.com/Feature-Design-
> > > Document-for-Update-Delete-support-in-CarbonData-tp3043p3093.html
> > > Sent from the Apache CarbonData Mailing List archive mailing list
> archive
> > > at Nabble.com.
> > >
> >
>