[ https://issues.apache.org/jira/browse/CARBONDATA-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brijoo Bopanna reassigned CARBONDATA-2877: ------------------------------------------ Assignee: Brijoo Bopanna (was: kumar vishal) > CarbonDataWriterException when loading data to carbon table with large number of rows/columns from Spark-Submit > --------------------------------------------------------------------------------------------------------------- > > Key: CARBONDATA-2877 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2877 > Project: CarbonData > Issue Type: Bug > Components: data-load > Affects Versions: 1.4.1 > Environment: Spark 2.1 > Reporter: Chetan Bhat > Assignee: Brijoo Bopanna > Priority: Major > > Steps : > from Spark-Submit. User creates a table with large number of columns(around 100) and tries to load around 3 lakh records to the table. > Spark-submit command - spark-submit --master yarn --num-executors 3 --executor-memory 75g --driver-memory 10g --executor-cores 12 --class > Actual Issue : Data loading fails with CarbonDataWriterException. > Executor yarn UI log- > org.apache.spark.util.TaskCompletionListenerException: org.apache.carbondata.core.datastore.exception.CarbonDataWriterException > Previous exception in task: Error while initializing data handler : > org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:141) > org.apache.carbondata.processing.loading.DataLoadExecutor.execute(DataLoadExecutor.java:51) > org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD$$anon$1.<init>(NewCarbonDataLoadRDD.scala:221) > org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD.internalCompute(NewCarbonDataLoadRDD.scala:197) > org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > org.apache.spark.scheduler.Task.run(Task.scala:99) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > java.lang.Thread.run(Thread.java:748) > at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138) > at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > Expected : The dataloading should be successful from Spark-submit similar to that in Beeline. -- This message was sent by Atlassian JIRA (v7.6.3#76005) |
Free forum by Nabble | Edit this page |