[jira] [Resolved] (CARBONDATA-3642) Improve error msg when string length exceed 32000

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Resolved] (CARBONDATA-3642) Improve error msg when string length exceed 32000

Akash R Nilugal (Jira)

     [ https://issues.apache.org/jira/browse/CARBONDATA-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Akash R Nilugal resolved CARBONDATA-3642.
-----------------------------------------
    Fix Version/s: 2.0.0
       Resolution: Fixed

> Improve error msg when string length exceed 32000
> -------------------------------------------------
>
>                 Key: CARBONDATA-3642
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3642
>             Project: CarbonData
>          Issue Type: Improvement
>          Components: spark-integration
>            Reporter: Hong Shen
>            Priority: Major
>             Fix For: 2.0.0
>
>          Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> When I run a produce sql, {code} insert overwrite TABLE table1 select * from table2 {code}
> table1 is a carbon table, it failed with error message:
> {code}
> Previous exception in task: Dataload failed, String length cannot exceed 32000 characters
> org.apache.carbondata.streaming.parser.FieldConverter$.objectToString(FieldConverter.scala:53)
> org.apache.carbondata.spark.util.CarbonScalaUtil$.getString(CarbonScalaUtil.scala:71)
> org.apache.carbondata.spark.rdd.NewRddIterator$$anonfun$next$1.apply$mcVI$sp(NewCarbonDataLoadRDD.scala:360)
> scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
> org.apache.carbondata.spark.rdd.NewRddIterator.next(NewCarbonDataLoadRDD.scala:359)
> org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anon$1.next(DataLoadProcessorStepOnSpark.scala:66)
> org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anon$1.next(DataLoadProcessorStepOnSpark.scala:61)
> org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anon$4.next(DataLoadProcessorStepOnSpark.scala:179)
> org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anon$4.next(DataLoadProcessorStepOnSpark.scala:170)
> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:216)
> org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:109)
> org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:102)
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$26.apply(RDD.scala:830)
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$26.apply(RDD.scala:830)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> org.apache.spark.scheduler.Task.run(Task.scala:109)
> org.apache.spark.executor.Executor$TaskRunner$$anon$2.run(Executor.scala:379)
> java.security.AccessController.doPrivileged(Native Method)
> javax.security.auth.Subject.doAs(Subject.java:360)
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1787)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:376)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:621)
> java.lang.Thread.run(Thread.java:849)
> at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:139)
> at org.apache.spark.TaskContextImpl.markTaskFailed(TaskContextImpl.scala:107)
> at org.apache.spark.scheduler.Task.run(Task.scala:114)
> ... 8 more
> {code}
> Since the table1 has 61 columns, it's difficult to find which column length exceed, here is the columns in table1.
> {code}
> `user_id`           string
> `user_type_id`      bigint
> `loged_time`        string
> `log_time`          string
> `stay_second`       string
> `product_id`        string
> `product_version`   string
> `biz_id`            string
> `biz_app_id`        string
> `biz_app_name`      string
> `bu_app_id`         string
> `bu_app_name`       string
> `spm`               string
> `spm_a`             string
> `spm_b`             string
> `spm_name`          string
> `activity_id`       string
> `page_id`           string
> `scm`               string
> `new_scm`           string
> `scm_sys_name`      string
> `session_id`        string
> `user_session_id`   string
> `parent_spm`        string
> `parent_spm_a`      string
> `parent_spm_b`      string
> `parent_page_id`    string
> `chinfo`            string
> `new_chinfo`        string
> `channel`           string
> `landing_page_spm`  string
> `public_id`         string
> `utdid`             string
> `tcid`              string
> `ucid`              string
> `device_model`      string
> `os_version`        string
> `network`           string
> `inner_version`     string
> `app_channel`       string
> `language`          string
> `ip`                string
> `ip_country_name`   string
> `ip_province_name`  string
> `ip_city_name`      string
> `city_id`           string
> `city_name`         string
> `province_id`       string
> `province_name`     string
> `country_id`        string
> `country_abbr_name` string
> `base_exinfo`       string
> `exinfo1`           string
> `exinfo2`           string
> `exinfo3`           string
> `exinfo4`           string
> `exinfo5`           string
> `env_type`          string
> `log_type`          string
> `behavior_id`       string
> `experiment_ids`    string
> {code}
> If the error msg has column idx or column name, it will be more friendly to user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)