http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Feature-Design-Document-for-Update-Delete-support-in-CarbonData-tp3043p3183.html
Yes, valid point. And there have been thoughts about it, there is lot of
scope for optimization of compaction strategies. We may even consider
compaction automatically in future.
> HI Ankiet,
>
> I think If update/delete is for less data then horizontal compaction can
> based on user configuration, but if more data is getting updated then
> better to start vertical compaction immediately , this is because we are
> not physically deleting the data from disk, if more data is getting
> updated(more than 60%) then during query first we will query the older data
> + exclude the deleted records+ include the update delta file data. So in
> this case more data will come into memory, we can avoid this by starting
> vertical compaction immediately after update/delete.
>
> -Regards
> Kumar Vishal
>
> On Thu, Nov 24, 2016 at 2:43 PM, Kumar Vishal <
[hidden email]>
> wrote:
>
> > Hi Aniket,
> >
> > I agree with Vimal opinion, but that use case will be very less.
> >
> > I have one query for this update and delete feature.
> > When we will start compaction after each update or delete operation?
> >
> > -Regards
> > Kumar Vishal
> >
> >
> >
> > On Thu, Nov 24, 2016 at 12:05 AM, Aniket Adnaik <
[hidden email]
> >
> > wrote:
> >
> >> Hi Vimal,
> >>
> >> Thanks for your suggestions.
> >> For the 1st point, i tend to agree with Manish's comments. But, it's
> worth
> >> looking into different ways to optimize the performance.
> >> I guess, query performance may take priority over update performance.
> >> Basically, we may need better compaction approach to merge
> >> delta files into regular carbon files to maintain adequate performance.
> >> For the 2nd point, CarbonData will support updating multiple rows, but
> not
> >> the same row multiple times in a single update operation. It is possible
> >> that join condition in sub-select of original update statement can
> result
> >> into multiple rows from source table for the same row in the target
> table.
> >> This is ambiguous condition and common ways to solve this is to error
> out
> >> ,
> >> or to apply first matching row, or to apply last matching row.
> CarbonData
> >> will choose to error out and let user resolve the ambiguity, which a
> >> safer/standard choice.
> >>
> >> Best Regards,
> >> Aniket
> >>
> >> On Wed, Nov 23, 2016 at 4:54 AM, manish gupta <
>
[hidden email]>
> >> wrote:
> >>
> >> > Hi Vimal,
> >> >
> >> > I have few queries regarding regarding the 1st suggestion.
> >> >
> >> > 1. Dimensions can both be dictionary and no dictionary. If we update
> the
> >> > dictionary file then we will have to maintain 2 flows one for
> dictionary
> >> > columns and 1 for no dictionary columns. Will that be ok?
> >> >
> >> > 2. We write dictionary files in append mode. Updating dictionary files
> >> will
> >> > be like completely rewriting the dictionary file which will also
> modify
> >> the
> >> > dictionary metadata and sort index file OR there is some other
> approach
> >> > that needs to be followed like maintaining a update delta mapping for
> >> > dictionary file.
> >> >
> >> > Regards
> >> > Manish Gupta
> >> >
> >> > On Wed, Nov 23, 2016 at 10:47 AM, Vimal Das Kammath <
> >> >
[hidden email]> wrote:
> >> >
> >> > > Hi Aniket,
> >> > >
> >> > > The design looks sound and the documentation is great.
> >> > > I have few suggestions.
> >> > >
> >> > > 1) Measure update vs dimension update : In case of dimension update.
> >> for
> >> > > example user wants to change dept1 to dept2 for all users who are
> >> under
> >> > > dept1. Can we just update the dictionary for faster performance?
> >> > > 2) Update Semantics (one matching record vs multiple matching
> >> record): I
> >> > > could not understand this section. Wanted to confirm if we will
> >> support
> >> > one
> >> > > update statement updating multiple rows.
> >> > >
> >> > > -Vimal
> >> > >
> >> > > On Tue, Nov 22, 2016 at 2:30 PM, Liang Chen <
>
[hidden email]>
> >> > > wrote:
> >> > >
> >> > > > Hi Aniket
> >> > > >
> >> > > > Thanks you finished the good design documents. A couple of inputs
> >> from
> >> > my
> >> > > > side:
> >> > > >
> >> > > > 1.Please add the below mentioned info(Rowid definition etc.) to
> >> design
> >> > > > documents also.
> >> > > > 2.In page6 :"Schema change operation can run in parallel with
> >> Update or
> >> > > > Delte operations, but not with another schema change operation" ,
> >> can
> >> > you
> >> > > > explain this item ?
> >> > > > 3.Please unify the description: use "CarbonData" to replace
> >> "Carbon",
> >> > > > unify the description for "destination table" and "target table".
> >> > > > 4.The Update operation's delete delta is same with Delete
> >> operation's
> >> > > > delete
> >> > > > delta?
> >> > > >
> >> > > > BTW, it would be much better if you could provide google docs for
> >> > review
> >> > > in
> >> > > > the next time, it is really difficult to give comment based on pdf
> >> > > > documents
> >> > > > :)
> >> > > >
> >> > > > Regards
> >> > > > Liang
> >> > > >
> >> > > > Aniket Adnaik wrote
> >> > > > > Hi Sujith,
> >> > > > >
> >> > > > > Please see my comments inline.
> >> > > > >
> >> > > > > Best Regards,
> >> > > > > Aniket
> >> > > > >
> >> > > > > On Sun, Nov 20, 2016 at 9:11 PM, sujith chacko <
> >> > > >
> >> > > > > sujithchacko.2010@
> >> > > >
> >> > > > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > >> Hi Aniket,
> >> > > > >>
> >> > > > >> Its a well documented design, just want to know few
> points
> >> > like
> >> > > > >>
> >> > > > >> a. Format of the RowID and its datatype
> >> > > > >>
> >> > > > > AA>> Following format can be used to represent a unique rowed;
> >> > > > >
> >> > > > > [
> >> > > > > <Segment ID>
> >> > > > > <Block ID>
> >> > > > > <Blocklet ID>
> >> > > > > <Offset in Blocklet>
> >> > > > > ]
> >> > > > > A simple way would be to use String data type and store it as a
> >> text
> >> > > > > file.
> >> > > > > However, more efficient way could be to use Bitsets/Bitmaps as
> >> > further
> >> > > > > optimization. Compressed Bitmaps such as Roaring bitmaps can be
> >> used
> >> > > for
> >> > > > > better performance and efficient storage.
> >> > > > >
> >> > > > > b. Impact of this feature in select query since every time
> query
> >> > > process
> >> > > > > has to exclude each deleted records and include corresponding
> >> updated
> >> > > > > record, any optimization is considered in tackling the query
> >> > > performance
> >> > > > > issue since one of the major highlights of carbon is
> performance.
> >> > > > > AA>> Some of the optimizations would be to cache the deltas to
> >> avoid
> >> > > > > recurrent I/O,
> >> > > > > to store sorted rowids in delete delta for efficient lookup, and
> >> > > perform
> >> > > > > regular compaction to minimize the impact on select query
> >> > performance.
> >> > > > > Additionally, we may have to explore ways to perform compaction
> >> > > > > automatically, for example, if more than 25% of rows are read
> from
> >> > > > deltas.
> >> > > > > Please feel free to share if you have any ideas or suggestions.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Sujith
> >> > > > >
> >> > > > > On Nov 20, 2016 9:24 PM, "Aniket Adnaik" <
> >> > > >
> >> > > > > aniket.adnaik@
> >> > > >
> >> > > > > > wrote:
> >> > > > >
> >> > > > >> Hi All,
> >> > > > >>
> >> > > > >> Please find a design doc for Update/Delete support in
> CarbonData.
> >> > > > >>
> >> > > > >>
https://drive.google.com/file/d/0B71_EuXTdDi8S2dxVjN6Z1RhWlU> >> /view?
> >> > > > >> usp=sharing
> >> > > > >>
> >> > > > >> Best Regards,
> >> > > > >> Aniket
> >> > > > >>
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > View this message in context:
http://apache-carbondata-> >> > > > mailing-list-archive.1130556.n5.nabble.com/Feature-Design-
> >> > > > Document-for-Update-Delete-support-in-CarbonData-tp3043p3093.html
> >> > > > Sent from the Apache CarbonData Mailing List archive mailing list
> >> > archive
> >> > > > at Nabble.com.
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>