http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Update-feature-enhancement-tp99769p100339.html
It is better to make consistent across all. Creating new segment simplifies
> Hi David,
>
> +1
>
>
>
> Initially when segments concept is started, it is viewed as a folder which
>
> is incrementally added with time, so that data retention use-cases like
>
> "delete segments before a given date" were thought of. In that case if
>
> updated records are written into new segment, then old records will become
>
> new records and retention model will not work on that data. So update
>
> records were written to the same segment folder.
>
>
>
> But later as the partition concept was introduced, that will be a clean
>
> method to implement retention or even using a delete by time column is a
>
> better method.
>
> So inserting new records into the new segment makes sense.
>
>
>
> Only disadvantage can be later supporting one column data update/replace
>
> feature which Likun was mentioning previously.
>
>
>
> So to generalize, update feature can support inserting the updated records
>
> to new segment. The logic to reload indexes when segments are updated can
>
> still be there, however when there is no insert of data to old segments,
>
> reload of indexes needs to be avoided.
>
>
>
> Increasing the number of segments need not be a reason for this to go
>
> ahead, as the problem of increasing segments anyway is a problem and needs
>
> to be solved using compaction either horizontal or vertical. Also
>
> optimization of segment file storage either filebased or DB based(embedded
>
> or external) for too big deployments needs to be solved independently.
>
>
>
> Regards,
>
> Ramana
>
>
>
> On Sat, Sep 5, 2020 at 7:58 AM Ajantha Bhat <
[hidden email]> wrote:
>
>
>
> > Hi David. Thanks for proposing this.
>
> >
>
> > *+1 from my side.*
>
> >
>
> > I have seen users with 200K segments table stored in cloud.
>
> > It will be really slow to reload all the segments where update happened
> for
>
> > indexes like SI, min-max, MV.
>
> >
>
> > So, it is good to write as a new segment
>
> > and just load new segment indexes. (try to reuse this flow
>
> > UpdateTableModel.loadAsNewSegment
>
> > = true)
>
> >
>
> > and user can compact the segments to avoid many new segments created by
>
> > update.
>
> > and we can also move the compacted segments to table status history I
> guess
>
> > to avoid more entries in table status.
>
> >
>
> > Thanks,
>
> > Ajantha
>
> >
>
> >
>
> >
>
> > On Fri, Sep 4, 2020 at 1:48 PM David CaiQiang <
[hidden email]>
>
> > wrote:
>
> >
>
> > > Hi Akash,
>
> > >
>
> > > 3. Update operation contain a insert operation. Update operation
>
> > will
>
> > > do the same thing how the insert operation process this issue.
>
> > >
>
> > >
>
> > >
>
> > > -----
>
> > > Best Regards
>
> > > David Cai
>
> > > --
>
> > > Sent from:
>
> > >
>
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/>
> > >
>
> >
>
> --