Re: Clean files enhancement
Posted by
vikramahuja1001 on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Clean-files-enhancement-tp100088p101810.html
Hi all,
PFA the design document.
Please provide suggestions or feedback
Vikram Ahuja
Thanks for the suggestion Ravi.
We can include a property in the clean files command which can decide if we want to dry run.
clean files on table t1 options('dry_run' = true) --> This will only show the segments which will be removed and will not clean/delete those segments or any data for that matter.
By default, the dry_run will be set as false and the user can configure it when they want to use it.
Rgds,
Vikram
+1 for ravi's comment. It's better, clean and safe.
Regards,
Akash R Nilugal
On Thu, Sep 24, 2020, 8:34 PM Ravindra Pesala <[hidden email]> wrote:
> Hi Vikram,
>
> +1
>
> It is good to remove the automatic cleanup.
> But I am still worried about the clean file command executed by user as
> well. We need to enhance the clean file command to introduce dry run to
> print what segments it is going to be deleted and what is left. If user ok
> with dry run result then he can go for actual run.
>
> Regards,
> Ravindra.
>
> On Mon, 21 Sep 2020 at 1:27 PM, Vikram Ahuja <[hidden email]>
> wrote:
>
> > Hi Ravi and David,
> >
> >
> >
> > 1. All the automatic clean data in the case of load/insert/compact/delete
> >
> > will be removed, so cleaning will only happen when the clean files
> command
> >
> > is called.
> >
> >
> >
> > 2. We will only add the data to trash when we try to clean data which is
> in
> >
> > IN PROGRESS state. In case of COmpacted/Marked For Delete it will not be
> >
> > moved to the trash, it will be directly deleted. The user will only be
> able
> >
> > to recover the In Progress segments if the user wants. @Ravi -> Is this
> >
> > okay for trash usage? Only using it for in progress segments.
> >
> >
> >
> > 3. No trash management will be implemented, the data will ONLY BE REMOVED
> >
> > from the trash folder immediately when the clean files command is called.
> >
> > There will be no time to live, the data can be kept in the trash folder
> >
> > untill the user triggers clean files command.
> >
> >
> >
> > Let me know if you have any questions.
> >
> >
> >
> > Vikram Ahuja
> >
> >
> >
> > On Fri, Sep 18, 2020 at 1:43 PM David CaiQiang <[hidden email]>
> > wrote:
> >
> >
> >
> > > agree with Ravindra,
> >
> > >
> >
> > > 1. stop all automatic clean data in
> load/insert/compact/update/delete...
> >
> > >
> >
> > > 2. when clean files command clean in-progress or uncertain data, we can
> >
> > > move
> >
> > > them to data trash.
> >
> > > it can prevent delete useful data by mistake, we already find this
> >
> > > issue
> >
> > > in some scenes.
> >
> > > other cases(for example clean mark_for_delete/compacted segment)
> > should
> >
> > > not use the data trash folder, clean data directly.
> >
> > >
> >
> > > 3. no need data trash management, suggest keeping it simple.
> >
> > > The clean file command should support empty trash immediately, it
> > will
> >
> > > be enough.
> >
> > >
> >
> > >
> >
> > >
> >
> > > -----
> >
> > > Best Regards
> >
> > > David Cai
> >
> > > --
> >
> > > Sent from:
> >
> > >
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
> > >
> >
> >
>
> --
> Thanks & Regards,
> Ravi
>