Re: [Discussion] Update feature enhancement

Posted by ravipesala on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Update-feature-enhancement-tp99769p100339.html

+1
Already partition loading uses the new segment to write the update delta
data.

It is better to make consistent across all. Creating new segment simplifies
the design.



On Mon, 14 Sep 2020 at 1:48 AM, Venkata Gollamudi <[hidden email]>
wrote:

> Hi David,
>
> +1
>
>
>
> Initially when segments concept is started, it is viewed as a folder which
>
> is incrementally added with time, so that data retention use-cases like
>
> "delete segments before a given date" were thought of. In that case if
>
> updated records are written into new segment, then old records will become
>
> new records and retention model will not work on that data. So update
>
> records were written to the same segment folder.
>
>
>
> But later as the partition concept was introduced, that will be a clean
>
> method to implement retention or even using a delete by time column is a
>
> better method.
>
> So inserting new records into the new segment makes sense.
>
>
>
> Only disadvantage can be later supporting one column data update/replace
>
> feature which Likun was mentioning previously.
>
>
>
> So to generalize, update feature can support inserting the updated records
>
> to new segment. The logic to reload indexes when segments are updated can
>
> still be there, however when there is no insert of data to old segments,
>
> reload of indexes needs to be avoided.
>
>
>
> Increasing the number of segments need not be a reason for this to go
>
> ahead, as the problem of increasing segments anyway is a problem and needs
>
> to be solved using compaction either horizontal or vertical. Also
>
> optimization of segment file storage either filebased or DB based(embedded
>
> or external) for too big deployments needs to be solved independently.
>
>
>
> Regards,
>
> Ramana
>
>
>
> On Sat, Sep 5, 2020 at 7:58 AM Ajantha Bhat <[hidden email]> wrote:
>
>
>
> > Hi David. Thanks for proposing this.
>
> >
>
> > *+1 from my side.*
>
> >
>
> > I have seen users with 200K segments table stored in cloud.
>
> > It will be really slow to reload all the segments where update happened
> for
>
> > indexes like SI, min-max, MV.
>
> >
>
> > So, it is good to write as a new segment
>
> > and just load new segment indexes. (try to reuse this flow
>
> > UpdateTableModel.loadAsNewSegment
>
> > = true)
>
> >
>
> > and user can compact the segments to avoid many new segments created by
>
> > update.
>
> > and we can also move the compacted segments to table status history I
> guess
>
> > to avoid more entries in table status.
>
> >
>
> > Thanks,
>
> > Ajantha
>
> >
>
> >
>
> >
>
> > On Fri, Sep 4, 2020 at 1:48 PM David CaiQiang <[hidden email]>
>
> > wrote:
>
> >
>
> > > Hi Akash,
>
> > >
>
> > >     3. Update operation contain a insert operation.  Update operation
>
> > will
>
> > > do the same thing how the insert operation process this issue.
>
> > >
>
> > >
>
> > >
>
> > > -----
>
> > > Best Regards
>
> > > David Cai
>
> > > --
>
> > > Sent from:
>
> > >
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
> > >
>
> >
>
> --
Thanks & Regards,
Ravi