hello,
my team is trying to implement merge operation, merge scenario like the following: compare records in two tables(same structure,different amout of records)and modify big one , 1. if small.id=big.id and small.date<big.data then update bigtable, 2. if small.id not in big then insert bigtable. our solution to this scenario is: 1. append the smalltable into bigtable 2. delete records from bigtable which have have the same id and leave the one which have the biggest date in bigtable in the back concurrently. My question is if Apache Carbon Community has plan to implement similar operation? [hidden email]
孙而焓【FFCS研究院】
|
Hi
1. Can you give a specific example, let us first understand your requirement exactly. Like below, to provide some fact data. ID date name age 1 2017-05-1 carbon 21 2 2017-05-23 spark 30 ...... 2. I would like to kindly invite your team guys to participate in contributing this feature if it is confirmed by dev community. Regards Liang 2017-05-26 12:47 GMT+08:00 [hidden email] <[hidden email]>: > hello, > my team is trying to implement merge operation, > merge scenario like the following: > compare records in two tables(same structure,different amout > of records)and modify big one , > 1. if small.id=big.id and small.date<big.data then update > bigtable, > 2. if small.id not in big then insert bigtable. > our solution to this scenario is: > 1. append the smalltable into bigtable > 2. delete records from bigtable which have have the same id > and leave the one which have the biggest date in bigtable in the back > concurrently. > My question is if Apache Carbon Community has plan to implement similar > operation? > > > > [hidden email] > |
merge example like this:
small: id updatatime 1 9:00 2 8:00 6 9:00 big: id updatetime 1 10:00 2 7:00 3 9:00 4 9:00 5 9:00 as for record in small: id=1,small.update<big.update,do nothing; id=2,small.update>bigdate.update,update big; id=6,big doesn't have that record,insert big; for our solution: append all small record to big, big: id updatetime 1 10:00 2 7:00(to be delete) 3 9:00 4 9:00 5 9:00 1 9:00(to be deleted) 2 8:00 6 9:00 then,for records in big which have the same id,max updatetime stays.
孙而焓【FFCS研究院】
|
Hi
For your this case, use delete and append whether can meet your requirements? Obviously , merge would impact index, so we should find out one best way to implement this feature. please other people give some comment also. Regards Liang 2017-05-27 9:45 GMT+08:00 Mic Sun <[hidden email]>: > merge example like this: > small: > id updatatime > 1 9:00 > 2 8:00 > 6 9:00 > > big: > id updatetime > 1 10:00 > 2 7:00 > 3 9:00 > 4 9:00 > 5 9:00 > > as for record in small: > id=1,small.update<big.update,do nothing; > id=2,small.update>bigdate.update,update big; > id=6,big doesn't have that record,insert big; > > for our solution: > append all small record to big, > big: > id updatetime > 1 10:00 > 2 7:00(to be delete) > 3 9:00 > 4 9:00 > 5 9:00 > 1 9:00(to be deleted) > 2 8:00 > 6 9:00 > then,for records in big which have the same id,max updatetime stays. > > > > -- > View this message in context: http://apache-carbondata-dev- > mailing-list-archive.1130556.n5.nabble.com/when-plan-to- > implemnt-merge-operation-tp13228p13288.html > Sent from the Apache CarbonData Dev Mailing List archive mailing list > archive at Nabble.com. > |
Free forum by Nabble | Edit this page |