Support updating/deleting data for stream table

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Support updating/deleting data for stream table

xm_zzc
This post was updated on .
Hi dev:
  Sometimes we need to delete some historical data from stream table to make
the table size not too large, but currently the stream table can't support
updating/deleting data, so we need to stop the app and use 'alter table
COMPACT 'close_streaming' command to close stream table, and then delete
data.
  According to discussion with Jacky and David offline, there are two
solutions to resolve this without stopping app:
 
  1. set all non-stream segments to 'carbon.input.segments.tablename'
property to delete data except stream segment, this's easy to implement, but not very precise when there are data stored in stream segments.
  2. support deleting data for stream segment too, this's more complicated, but precise.
 
  I think we can implement with solution 1 first, and then consider the
implementation of solution 2 in depth.
 
  Welcome to feedback, thanks.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Support updating/deleting data for stream table

ZhuWilliam
Since time lapses, the table will become bigger and bigger, we do need one
way to clean data out of date.
Make StreamTable support partition maybe a good choice. The first step we
can make this simple, today's data will be mixed with normal segment and
stream segment. when rolling, we will make non-today's partition normal
segments.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Support updating/deleting data for stream table

Liang Chen
Administrator
In reply to this post by xm_zzc
Hi

Thank you started this discussion thread.
Agree with solution1, use the easy way to delete data for stream table.

Regards
Liang

xm_zzc wrote

> Hi dev:
>   Sometimes we need to delete some historical data from stream table to
> make
> the table size not too large, but currently the stream table can't support
> updating/deleting data, so we need to stop the app and use 'alter table
> COMPACT 'close_streaming' command to close stream table, and then delete
> data.
>   According to discussion with Jacky and David offline, there are two
> solutions to resolve this without stopping app:
>  
>   1. set all
*
> non-stream
*

>  segments to 'carbon.input.segments.tablename'
> property to delete data except stream segment, this's easy to implement,
> but not very precise when there are data stored in stream segments.
>   2. support deleting data for stream segment too, this's more
> complicated, but precise.
>  
>   I think we can implement with solution 1 first, and then consider the
> implementation of solution 2 in depth.
>  
>   Welcome to feedback, thanks.
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Support updating/deleting data for stream table

sraghunandan
Hi,
But this leads to inconsistent data returned to the user. For example user
wanted to replace all 2g with 4g,but in stream segments it will still be 2g
and would be returned in user query. I think we need to handle this
scenario also

Regards
Raghu

On Wed, 30 May 2018, 9:22 pm Liang Chen, <[hidden email]> wrote:

> Hi
>
> Thank you started this discussion thread.
> Agree with solution1, use the easy way to delete data for stream table.
>
> Regards
> Liang
>
> xm_zzc wrote
> > Hi dev:
> >   Sometimes we need to delete some historical data from stream table to
> > make
> > the table size not too large, but currently the stream table can't
> support
> > updating/deleting data, so we need to stop the app and use 'alter table
> > COMPACT 'close_streaming' command to close stream table, and then delete
> > data.
> >   According to discussion with Jacky and David offline, there are two
> > solutions to resolve this without stopping app:
> >
> >   1. set all
> *
> > non-stream
> *
> >  segments to 'carbon.input.segments.tablename'
> > property to delete data except stream segment, this's easy to implement,
> > but not very precise when there are data stored in stream segments.
> >   2. support deleting data for stream segment too, this's more
> > complicated, but precise.
> >
> >   I think we can implement with solution 1 first, and then consider the
> > implementation of solution 2 in depth.
> >
> >   Welcome to feedback, thanks.
> >
> >
> >
> > --
> > Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
>
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: Support updating/deleting data for stream table

xm_zzc
Hi  Raghu:
  Yep, you are right, so I said solution 1 is not very precise when there
are still some data you want to update/delete being stored in stream
segments, solution 2 can handle this scenario you mentioned.
  But, in my opinion, the scenario of deleting historical data is more
common than the one of updating data, the data size of stream table will
grow day by day, user generally want to delete specific data to make data
size not too large, for example, if user want to keep data for one year, he
need to delete one year ago of data everyday. On the other hand, solution 2
is more complicated than solution 1, we need to consider the implement of
solution 2 in depth.
  Based on the above reasons, Liang Chen, Jacky, David and I prefered to
implement Solution 1 first. Is it ok for you?
 
  Is there any other suggestion?



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Support updating/deleting data for stream table

Liang Chen
Administrator
Hi

+1 for first considering solution1

Regards
Liang

xm_zzc wrote

> Hi  Raghu:
>   Yep, you are right, so I said solution 1 is not very precise when there
> are still some data you want to update/delete being stored in stream
> segments, solution 2 can handle this scenario you mentioned.
>   But, in my opinion, the scenario of deleting historical data is more
> common than the one of updating data, the data size of stream table will
> grow day by day, user generally want to delete specific data to make data
> size not too large, for example, if user want to keep data for one year,
> he
> need to delete one year ago of data everyday. On the other hand, solution
> 2
> is more complicated than solution 1, we need to consider the implement of
> solution 2 in depth.
>   Based on the above reasons, Liang Chen, Jacky, David and I prefered to
> implement Solution 1 first. Is it ok for you?
>  
>   Is there any other suggestion?
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Support updating/deleting data for stream table

sraghunandan
Hi,
Those are 2 steps in the same solution. Not different solutions. We can
create jira considering all and implement only the part. The parent jira
would get closed when all the child jira are implemented

Regards
Raghu

On Sun, 3 Jun 2018, 1:07 pm Liang Chen, <[hidden email]> wrote:

> Hi
>
> +1 for first considering solution1
>
> Regards
> Liang
>
> xm_zzc wrote
> > Hi  Raghu:
> >   Yep, you are right, so I said solution 1 is not very precise when there
> > are still some data you want to update/delete being stored in stream
> > segments, solution 2 can handle this scenario you mentioned.
> >   But, in my opinion, the scenario of deleting historical data is more
> > common than the one of updating data, the data size of stream table will
> > grow day by day, user generally want to delete specific data to make data
> > size not too large, for example, if user want to keep data for one year,
> > he
> > need to delete one year ago of data everyday. On the other hand, solution
> > 2
> > is more complicated than solution 1, we need to consider the implement of
> > solution 2 in depth.
> >   Based on the above reasons, Liang Chen, Jacky, David and I prefered to
> > implement Solution 1 first. Is it ok for you?
> >
> >   Is there any other suggestion?
> >
> >
> >
> > --
> > Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
>
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: Support updating/deleting data for stream table

xm_zzc
Hi:
  ok, I will create a parent jira to trace this issue.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/