Hong Shen created CARBONDATA-3642:
------------------------------------- Summary: Improve error msg when string length exceed 32000 Key: CARBONDATA-3642 URL: https://issues.apache.org/jira/browse/CARBONDATA-3642 Project: CarbonData Issue Type: Improvement Components: spark-integration Reporter: Hong Shen When I run a produce sql, {code} insert overwrite TABLE table1 select * from table2 {code} table1 is a carbon table, it failed with error message: {code} Previous exception in task: Dataload failed, String length cannot exceed 32000 characters org.apache.carbondata.streaming.parser.FieldConverter$.objectToString(FieldConverter.scala:53) org.apache.carbondata.spark.util.CarbonScalaUtil$.getString(CarbonScalaUtil.scala:71) org.apache.carbondata.spark.rdd.NewRddIterator$$anonfun$next$1.apply$mcVI$sp(NewCarbonDataLoadRDD.scala:360) scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) org.apache.carbondata.spark.rdd.NewRddIterator.next(NewCarbonDataLoadRDD.scala:359) org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anon$1.next(DataLoadProcessorStepOnSpark.scala:66) org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anon$1.next(DataLoadProcessorStepOnSpark.scala:61) org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anon$4.next(DataLoadProcessorStepOnSpark.scala:179) org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anon$4.next(DataLoadProcessorStepOnSpark.scala:170) scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:216) org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:109) org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:102) org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$26.apply(RDD.scala:830) org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$26.apply(RDD.scala:830) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) org.apache.spark.rdd.RDD.iterator(RDD.scala:288) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) org.apache.spark.scheduler.Task.run(Task.scala:109) org.apache.spark.executor.Executor$TaskRunner$$anon$2.run(Executor.scala:379) java.security.AccessController.doPrivileged(Native Method) javax.security.auth.Subject.doAs(Subject.java:360) org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1787) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:376) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:621) java.lang.Thread.run(Thread.java:849) at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:139) at org.apache.spark.TaskContextImpl.markTaskFailed(TaskContextImpl.scala:107) at org.apache.spark.scheduler.Task.run(Task.scala:114) ... 8 more {code} Since the table1 has 61 columns, it's difficult to find which column length exceed, here is the columns in table1. {code} `user_id` string `user_type_id` bigint `loged_time` string `log_time` string `stay_second` string `product_id` string `product_version` string `biz_id` string `biz_app_id` string `biz_app_name` string `bu_app_id` string `bu_app_name` string `spm` string `spm_a` string `spm_b` string `spm_name` string `activity_id` string `page_id` string `scm` string `new_scm` string `scm_sys_name` string `session_id` string `user_session_id` string `parent_spm` string `parent_spm_a` string `parent_spm_b` string `parent_page_id` string `chinfo` string `new_chinfo` string `channel` string `landing_page_spm` string `public_id` string `utdid` string `tcid` string `ucid` string `device_model` string `os_version` string `network` string `inner_version` string `app_channel` string `language` string `ip` string `ip_country_name` string `ip_province_name` string `ip_city_name` string `city_id` string `city_name` string `province_id` string `province_name` string `country_id` string `country_abbr_name` string `base_exinfo` string `exinfo1` string `exinfo2` string `exinfo3` string `exinfo4` string `exinfo5` string `env_type` string `log_type` string `behavior_id` string `experiment_ids` string {code} If the error msg has column idx or column name, it will be more friendly to user. -- This message was sent by Atlassian Jira (v8.3.4#803005) |
Free forum by Nabble | Edit this page |