can't apply mappartitions to dataframe generated from carboncontext
Posted by
孙而焓 on
Jun 12, 2017; 3:22am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/can-t-apply-mappartitions-to-dataframe-generated-from-carboncontext-tp14565.html
hi:
appendix are the full error message
I try to modify dataframe row in non-sql way and get Task not serializable,my test procedure like follows:
1.val df=cc.sql(select * from t1)
2.def function1 (iterator: Iterator[Row]):Iterator[Row]={
var list=scala.collection.mutable.ListBuffer[Row]()
while (iterator.hasNext) {
var r=iterator.next if(r.getAs[String]("col1").toString.equalsIgnoreCase(r.getAs[String]("col2").toString)) list+=r}
list.iterator
}
3.df.mapPartitions(r=>function1 (r))
I also apply function1 to dataframe generated from sqlContext work fine,
so i believe carboncontext is refering some outer variables that are not serializable.
孙而焓【FFCS研究院】