This post was updated on .
Hi dev:
Sometimes we need to delete some historical data from stream table to make the table size not too large, but currently the stream table can't support updating/deleting data, so we need to stop the app and use 'alter table COMPACT 'close_streaming' command to close stream table, and then delete data. According to discussion with Jacky and David offline, there are two solutions to resolve this without stopping app: 1. set all non-stream segments to 'carbon.input.segments.tablename' property to delete data except stream segment, this's easy to implement, but not very precise when there are data stored in stream segments. 2. support deleting data for stream segment too, this's more complicated, but precise. I think we can implement with solution 1 first, and then consider the implementation of solution 2 in depth. Welcome to feedback, thanks. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Since time lapses, the table will become bigger and bigger, we do need one
way to clean data out of date. Make StreamTable support partition maybe a good choice. The first step we can make this simple, today's data will be mixed with normal segment and stream segment. when rolling, we will make non-today's partition normal segments. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Administrator
|
In reply to this post by xm_zzc
Hi
Thank you started this discussion thread. Agree with solution1, use the easy way to delete data for stream table. Regards Liang xm_zzc wrote > Hi dev: > Sometimes we need to delete some historical data from stream table to > make > the table size not too large, but currently the stream table can't support > updating/deleting data, so we need to stop the app and use 'alter table > COMPACT 'close_streaming' command to close stream table, and then delete > data. > According to discussion with Jacky and David offline, there are two > solutions to resolve this without stopping app: > > 1. set all > non-stream * > segments to 'carbon.input.segments.tablename' > property to delete data except stream segment, this's easy to implement, > but not very precise when there are data stored in stream segments. > 2. support deleting data for stream segment too, this's more > complicated, but precise. > > I think we can implement with solution 1 first, and then consider the > implementation of solution 2 in depth. > > Welcome to feedback, thanks. > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi,
But this leads to inconsistent data returned to the user. For example user wanted to replace all 2g with 4g,but in stream segments it will still be 2g and would be returned in user query. I think we need to handle this scenario also Regards Raghu On Wed, 30 May 2018, 9:22 pm Liang Chen, <[hidden email]> wrote: > Hi > > Thank you started this discussion thread. > Agree with solution1, use the easy way to delete data for stream table. > > Regards > Liang > > xm_zzc wrote > > Hi dev: > > Sometimes we need to delete some historical data from stream table to > > make > > the table size not too large, but currently the stream table can't > support > > updating/deleting data, so we need to stop the app and use 'alter table > > COMPACT 'close_streaming' command to close stream table, and then delete > > data. > > According to discussion with Jacky and David offline, there are two > > solutions to resolve this without stopping app: > > > > 1. set all > * > > non-stream > * > > segments to 'carbon.input.segments.tablename' > > property to delete data except stream segment, this's easy to implement, > > but not very precise when there are data stored in stream segments. > > 2. support deleting data for stream segment too, this's more > > complicated, but precise. > > > > I think we can implement with solution 1 first, and then consider the > > implementation of solution 2 in depth. > > > > Welcome to feedback, thanks. > > > > > > > > -- > > Sent from: > > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > > > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Hi Raghu:
Yep, you are right, so I said solution 1 is not very precise when there are still some data you want to update/delete being stored in stream segments, solution 2 can handle this scenario you mentioned. But, in my opinion, the scenario of deleting historical data is more common than the one of updating data, the data size of stream table will grow day by day, user generally want to delete specific data to make data size not too large, for example, if user want to keep data for one year, he need to delete one year ago of data everyday. On the other hand, solution 2 is more complicated than solution 1, we need to consider the implement of solution 2 in depth. Based on the above reasons, Liang Chen, Jacky, David and I prefered to implement Solution 1 first. Is it ok for you? Is there any other suggestion? -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Administrator
|
Hi
+1 for first considering solution1 Regards Liang xm_zzc wrote > Hi Raghu: > Yep, you are right, so I said solution 1 is not very precise when there > are still some data you want to update/delete being stored in stream > segments, solution 2 can handle this scenario you mentioned. > But, in my opinion, the scenario of deleting historical data is more > common than the one of updating data, the data size of stream table will > grow day by day, user generally want to delete specific data to make data > size not too large, for example, if user want to keep data for one year, > he > need to delete one year ago of data everyday. On the other hand, solution > 2 > is more complicated than solution 1, we need to consider the implement of > solution 2 in depth. > Based on the above reasons, Liang Chen, Jacky, David and I prefered to > implement Solution 1 first. Is it ok for you? > > Is there any other suggestion? > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi,
Those are 2 steps in the same solution. Not different solutions. We can create jira considering all and implement only the part. The parent jira would get closed when all the child jira are implemented Regards Raghu On Sun, 3 Jun 2018, 1:07 pm Liang Chen, <[hidden email]> wrote: > Hi > > +1 for first considering solution1 > > Regards > Liang > > xm_zzc wrote > > Hi Raghu: > > Yep, you are right, so I said solution 1 is not very precise when there > > are still some data you want to update/delete being stored in stream > > segments, solution 2 can handle this scenario you mentioned. > > But, in my opinion, the scenario of deleting historical data is more > > common than the one of updating data, the data size of stream table will > > grow day by day, user generally want to delete specific data to make data > > size not too large, for example, if user want to keep data for one year, > > he > > need to delete one year ago of data everyday. On the other hand, solution > > 2 > > is more complicated than solution 1, we need to consider the implement of > > solution 2 in depth. > > Based on the above reasons, Liang Chen, Jacky, David and I prefered to > > implement Solution 1 first. Is it ok for you? > > > > Is there any other suggestion? > > > > > > > > -- > > Sent from: > > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > > > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Hi:
ok, I will create a parent jira to trace this issue. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Free forum by Nabble | Edit this page |