[ https://issues.apache.org/jira/browse/CARBONDATA-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash R Nilugal resolved CARBONDATA-3642. ----------------------------------------- Fix Version/s: 2.0.0 Resolution: Fixed > Improve error msg when string length exceed 32000 > ------------------------------------------------- > > Key: CARBONDATA-3642 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3642 > Project: CarbonData > Issue Type: Improvement > Components: spark-integration > Reporter: Hong Shen > Priority: Major > Fix For: 2.0.0 > > Time Spent: 7.5h > Remaining Estimate: 0h > > When I run a produce sql, {code} insert overwrite TABLE table1 select * from table2 {code} > table1 is a carbon table, it failed with error message: > {code} > Previous exception in task: Dataload failed, String length cannot exceed 32000 characters > org.apache.carbondata.streaming.parser.FieldConverter$.objectToString(FieldConverter.scala:53) > org.apache.carbondata.spark.util.CarbonScalaUtil$.getString(CarbonScalaUtil.scala:71) > org.apache.carbondata.spark.rdd.NewRddIterator$$anonfun$next$1.apply$mcVI$sp(NewCarbonDataLoadRDD.scala:360) > scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) > org.apache.carbondata.spark.rdd.NewRddIterator.next(NewCarbonDataLoadRDD.scala:359) > org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anon$1.next(DataLoadProcessorStepOnSpark.scala:66) > org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anon$1.next(DataLoadProcessorStepOnSpark.scala:61) > org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anon$4.next(DataLoadProcessorStepOnSpark.scala:179) > org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anon$4.next(DataLoadProcessorStepOnSpark.scala:170) > scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) > org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:216) > org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:109) > org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:102) > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$26.apply(RDD.scala:830) > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$26.apply(RDD.scala:830) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > org.apache.spark.scheduler.Task.run(Task.scala:109) > org.apache.spark.executor.Executor$TaskRunner$$anon$2.run(Executor.scala:379) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:360) > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1787) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:376) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:621) > java.lang.Thread.run(Thread.java:849) > at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:139) > at org.apache.spark.TaskContextImpl.markTaskFailed(TaskContextImpl.scala:107) > at org.apache.spark.scheduler.Task.run(Task.scala:114) > ... 8 more > {code} > Since the table1 has 61 columns, it's difficult to find which column length exceed, here is the columns in table1. > {code} > `user_id` string > `user_type_id` bigint > `loged_time` string > `log_time` string > `stay_second` string > `product_id` string > `product_version` string > `biz_id` string > `biz_app_id` string > `biz_app_name` string > `bu_app_id` string > `bu_app_name` string > `spm` string > `spm_a` string > `spm_b` string > `spm_name` string > `activity_id` string > `page_id` string > `scm` string > `new_scm` string > `scm_sys_name` string > `session_id` string > `user_session_id` string > `parent_spm` string > `parent_spm_a` string > `parent_spm_b` string > `parent_page_id` string > `chinfo` string > `new_chinfo` string > `channel` string > `landing_page_spm` string > `public_id` string > `utdid` string > `tcid` string > `ucid` string > `device_model` string > `os_version` string > `network` string > `inner_version` string > `app_channel` string > `language` string > `ip` string > `ip_country_name` string > `ip_province_name` string > `ip_city_name` string > `city_id` string > `city_name` string > `province_id` string > `province_name` string > `country_id` string > `country_abbr_name` string > `base_exinfo` string > `exinfo1` string > `exinfo2` string > `exinfo3` string > `exinfo4` string > `exinfo5` string > `env_type` string > `log_type` string > `behavior_id` string > `experiment_ids` string > {code} > If the error msg has column idx or column name, it will be more friendly to user. -- This message was sent by Atlassian Jira (v8.3.4#803005) |
Free forum by Nabble | Edit this page |