as subquery is not supported in spark1.6+carbon1.1.0,I decide to prefetch id values in scala list:
spark-shell>> var temp1=cc.sql("select id from table where limit 1000").select("id").rdd.map(r => r(0)).collect().mkString(",") cc.sql(s"""delete from e_carbon.V_ORDER_ITEM_PROC_ATTR_CARBON4 where ORDER_ITEM_PROC_ATTR_ID in ( $temp1) """).show and get error info like following: WARN ExecutorAllocationManager: No stages are running, but numRunningTasks != 0 AUDIT deleteExecution$: [HETL032][e_carbon][Thread-1]Delete data operation is failed for table ERROR deleteExecution$: main Delete data operation is failed due to failure in creating delete delta file for segment : null block : null after deleting,i run: cc.sql(s"""select count(*) from e_carbon.V_ORDER_ITEM_PROC_ATTR_CARBON4 where ORDER_ITEM_PROC_ATTR_ID in ( $temp1) """).show the result is 1000 It only delete success maximun at 200 a batch,and took about 1min which is too slow. SO my question is how to tuning the performance to make the batch larger and delete faster [hidden email]
孙而焓【FFCS研究院】
|
Hi sunerhan
I tested at my local machine, can delete more than 1000 rows at one batch. Need to reproduce the error : ERROR deleteExecution$: main Delete data operation is failed due to failure in creating delete delta file for segment : null block : null Regards Liang 2017-05-23 11:52 GMT+08:00 [hidden email] <[hidden email]>: > as subquery is not supported in spark1.6+carbon1.1.0,I decide to prefetch > id values in scala list: > spark-shell>> > var temp1=cc.sql("select id from table where limit > 1000").select("id").rdd.map(r => r(0)).collect().mkString(",") > cc.sql(s"""delete from e_carbon.V_ORDER_ITEM_PROC_ATTR_CARBON4 where > ORDER_ITEM_PROC_ATTR_ID in ( $temp1) """).show > and get error info like following: > WARN ExecutorAllocationManager: No stages are running, but > numRunningTasks != 0 > AUDIT deleteExecution$: [HETL032][e_carbon][Thread-1]Delete data > operation is failed for table > ERROR deleteExecution$: main Delete data operation is failed due to > failure in creating delete delta file for segment : null block : null > after deleting,i run: > cc.sql(s"""select count(*) from e_carbon.V_ORDER_ITEM_PROC_ATTR_CARBON4 > where ORDER_ITEM_PROC_ATTR_ID in ( $temp1) """).show > the result is 1000 > It only delete success maximun at 200 a batch,and took about 1min which is > too slow. > SO my question is how to tuning the performance to make the batch larger > and delete faster > > > [hidden email] > |
Free forum by Nabble | Edit this page |