|
dev
Version: Spark 2.1.1 , carbondata 1.1.1 hadoop 2.7.2
test table:
xitest2 amount of data 2Billion ,
xitemp2 amount of data 0 ,
xitemp amount of data 950
run sql
cc.sql("update xitest2 a set ( a.qqnum,a.nick,a.age,a.gender,a.auth,a.qunnum)=(select b.qqnum,b.nick,b.age,b.gender,b.auth,b.qunnum from xitemp2 b where b.pkid=a.pkid)").show;
shuffle read 336.5 KB
shuffle write 336.5 KB
run sql
cc.sql("update xitest2 a set ( a.qqnum,a.nick,a.age,a.gender,a.auth,a.qunnum)=(select b.qqnum,b.nick,b.age,b.gender,b.auth,b.qunnum from xitemp b where b.pkid=a.pkid)").show;
shuffle read 1224M
shuffle WRITE 2.4G
When update subquery and subquery data = 0 shuffle too large can be optimized
yixu2001
|