http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/insert-into-carbon-table-failed-tp9609p9673.html
successful. Can you make sure that latest jar is updated in all the
datanodes and driver. There may be possibility that old jar is still
> I download the newest sourcecode (master) and compile,generate the jar
> carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar
> Then i use spark2.1 test again.The error logs are as follow:
>
>
> Container log :
> 17/03/27 02:27:21 ERROR newflow.DataLoadExecutor: Executor task launch
> worker-9 Data Loading failed for table carbon_table
> java.lang.NullPointerException
> at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:
> 158)
> at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
> at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:43)
> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$
> anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.
> compute(NewCarbonDataLoadRDD.scala:322)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:227)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 17/03/27 02:27:21 INFO rdd.NewDataFrameLoaderRDD: DataLoad failure
> org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
> Data Loading failed for table carbon_table
> at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:54)
> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$
> anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.
> compute(NewCarbonDataLoadRDD.scala:322)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:227)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:
> 158)
> at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
> at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:43)
> ... 10 more
> 17/03/27 02:27:21 ERROR rdd.NewDataFrameLoaderRDD: Executor task launch
> worker-9
> org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
> Data Loading failed for table carbon_table
> at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:54)
> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$
> anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.
> compute(NewCarbonDataLoadRDD.scala:322)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:227)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:
> 158)
> at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
> at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:43)
> ... 10 more
> 17/03/27 02:27:21 ERROR executor.Executor: Exception in task 0.3 in stage
> 2.0 (TID 538)
> org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
> Data Loading failed for table carbon_table
> at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:54)
> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$
> anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.
> compute(NewCarbonDataLoadRDD.scala:322)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:227)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:
> 158)
> at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
> at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:43)
> ... 10 more
>
>
>
> Spark log:
>
> ERROR 27-03 02:27:21,407 - Task 0 in stage 2.0 failed 4 times; aborting job
> ERROR 27-03 02:27:21,419 - main load data frame failed
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 2.0 (TID 538, hd25): org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
> Data Loading failed for table carbon_table
> at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:54)
> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$
> anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.
> compute(NewCarbonDataLoadRDD.scala:322)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:227)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:
> 158)
> at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
> at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:43)
> ... 10 more
>
>
> Driver stacktrace:
> at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
> scheduler$DAGScheduler$$failJobAndIndependentStages(
> DAGScheduler.scala:1431)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> abortStage$1.apply(DAGScheduler.scala:1419)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> abortStage$1.apply(DAGScheduler.scala:1418)
> at scala.collection.mutable.ResizableArray$class.foreach(
> ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(
> ArrayBuffer.scala:47)
> at org.apache.spark.scheduler.DAGScheduler.abortStage(
> DAGScheduler.scala:1418)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
> at scala.Option.foreach(Option.scala:236)
> at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(
> DAGScheduler.scala:799)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> doOnReceive(DAGScheduler.scala:1640)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> onReceive(DAGScheduler.scala:1599)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> onReceive(DAGScheduler.scala:1588)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> at org.apache.spark.scheduler.DAGScheduler.runJob(
> DAGScheduler.scala:620)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
> at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.
> scala:927)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:150)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:111)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
> at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
> at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.
> loadDataFrame$1(CarbonDataRDDFactory.scala:665)
> at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.
> loadCarbonData(CarbonDataRDDFactory.scala:794)
> at org.apache.spark.sql.execution.command.LoadTable.
> run(carbonTableSchema.scala:579)
> at org.apache.spark.sql.execution.command.LoadTableByInsert.run(
> carbonTableSchema.scala:297)
> at org.apache.spark.sql.execution.ExecutedCommand.
> sideEffectResult$lzycompute(commands.scala:58)
> at org.apache.spark.sql.execution.ExecutedCommand.
> sideEffectResult(commands.scala:56)
> at org.apache.spark.sql.execution.ExecutedCommand.
> doExecute(commands.scala:70)
> at org.apache.spark.sql.execution.SparkPlan$$anonfun$
> execute$5.apply(SparkPlan.scala:132)
> at org.apache.spark.sql.execution.SparkPlan$$anonfun$
> execute$5.apply(SparkPlan.scala:130)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:150)
> at org.apache.spark.sql.execution.SparkPlan.execute(
> SparkPlan.scala:130)
> at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(
> QueryExecution.scala:55)
> at org.apache.spark.sql.execution.QueryExecution.
> toRdd(QueryExecution.scala:55)
> at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
> at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
> at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>
> (<console>:31)
> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<
> console>:36)
> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>
> :38)
> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
> at $line23.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
> at $line23.$read$$iwC$$iwC$$iwC.<init>(<console>:44)
> at $line23.$read$$iwC$$iwC.<init>(<console>:46)
> at $line23.$read$$iwC.<init>(<console>:48)
> at $line23.$read.<init>(<console>:50)
> at $line23.$read$.<init>(<console>:54)
> at $line23.$read$.<clinit>(<console>)
> at $line23.$eval$.<init>(<console>:7)
> at $line23.$eval$.<clinit>(<console>)
> at $line23.$eval.$print(<console>)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(
> SparkIMain.scala:1065)
> at org.apache.spark.repl.SparkIMain$Request.loadAndRun(
> SparkIMain.scala:1346)
> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(
> SparkIMain.scala:840)
> at org.apache.spark.repl.SparkIMain.interpret(
> SparkIMain.scala:871)
> at org.apache.spark.repl.SparkIMain.interpret(
> SparkIMain.scala:819)
> at org.apache.spark.repl.SparkILoop.reallyInterpret$1(
> SparkILoop.scala:857)
> at org.apache.spark.repl.SparkILoop.interpretStartingWith(
> SparkILoop.scala:902)
> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
> at org.apache.spark.repl.SparkILoop.processLine$1(
> SparkILoop.scala:657)
> at org.apache.spark.repl.SparkILoop.innerLoop$1(
> SparkILoop.scala:665)
> at org.apache.spark.repl.SparkILoop.org$apache$spark$
> repl$SparkILoop$$loop(SparkILoop.scala:670)
> at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
> at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(
> ScalaClassLoader.scala:135)
> at org.apache.spark.repl.SparkILoop.org$apache$spark$
> repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(
> SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(
> SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.
> scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
> Data Loading failed for table carbon_table
> at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:54)
> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$
> anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.
> compute(NewCarbonDataLoadRDD.scala:322)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:227)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:
> 158)
> at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
> at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:43)
> ... 10 more
> ERROR 27-03 02:27:21,422 - main
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 2.0 (TID 538, hd25): org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
> Data Loading failed for table carbon_table
> at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:54)
> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$
> anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.
> compute(NewCarbonDataLoadRDD.scala:322)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:227)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:
> 158)
> at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
> at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:43)
> ... 10 more
>
>
> Driver stacktrace:
> at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
> scheduler$DAGScheduler$$failJobAndIndependentStages(
> DAGScheduler.scala:1431)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> abortStage$1.apply(DAGScheduler.scala:1419)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> abortStage$1.apply(DAGScheduler.scala:1418)
> at scala.collection.mutable.ResizableArray$class.foreach(
> ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(
> ArrayBuffer.scala:47)
> at org.apache.spark.scheduler.DAGScheduler.abortStage(
> DAGScheduler.scala:1418)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
> at scala.Option.foreach(Option.scala:236)
> at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(
> DAGScheduler.scala:799)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> doOnReceive(DAGScheduler.scala:1640)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> onReceive(DAGScheduler.scala:1599)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> onReceive(DAGScheduler.scala:1588)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> at org.apache.spark.scheduler.DAGScheduler.runJob(
> DAGScheduler.scala:620)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
> at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.
> scala:927)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:150)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:111)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
> at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
> at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.
> loadDataFrame$1(CarbonDataRDDFactory.scala:665)
> at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.
> loadCarbonData(CarbonDataRDDFactory.scala:794)
> at org.apache.spark.sql.execution.command.LoadTable.
> run(carbonTableSchema.scala:579)
> at org.apache.spark.sql.execution.command.LoadTableByInsert.run(
> carbonTableSchema.scala:297)
> at org.apache.spark.sql.execution.ExecutedCommand.
> sideEffectResult$lzycompute(commands.scala:58)
> at org.apache.spark.sql.execution.ExecutedCommand.
> sideEffectResult(commands.scala:56)
> at org.apache.spark.sql.execution.ExecutedCommand.
> doExecute(commands.scala:70)
> at org.apache.spark.sql.execution.SparkPlan$$anonfun$
> execute$5.apply(SparkPlan.scala:132)
> at org.apache.spark.sql.execution.SparkPlan$$anonfun$
> execute$5.apply(SparkPlan.scala:130)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:150)
> at org.apache.spark.sql.execution.SparkPlan.execute(
> SparkPlan.scala:130)
> at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(
> QueryExecution.scala:55)
> at org.apache.spark.sql.execution.QueryExecution.
> toRdd(QueryExecution.scala:55)
> at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
> at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
> at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>
> (<console>:31)
> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<
> console>:36)
> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>
> :38)
> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
> at $line23.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
> at $line23.$read$$iwC$$iwC$$iwC.<init>(<console>:44)
> at $line23.$read$$iwC$$iwC.<init>(<console>:46)
> at $line23.$read$$iwC.<init>(<console>:48)
> at $line23.$read.<init>(<console>:50)
> at $line23.$read$.<init>(<console>:54)
> at $line23.$read$.<clinit>(<console>)
> at $line23.$eval$.<init>(<console>:7)
> at $line23.$eval$.<clinit>(<console>)
> at $line23.$eval.$print(<console>)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(
> SparkIMain.scala:1065)
> at org.apache.spark.repl.SparkIMain$Request.loadAndRun(
> SparkIMain.scala:1346)
> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(
> SparkIMain.scala:840)
> at org.apache.spark.repl.SparkIMain.interpret(
> SparkIMain.scala:871)
> at org.apache.spark.repl.SparkIMain.interpret(
> SparkIMain.scala:819)
> at org.apache.spark.repl.SparkILoop.reallyInterpret$1(
> SparkILoop.scala:857)
> at org.apache.spark.repl.SparkILoop.interpretStartingWith(
> SparkILoop.scala:902)
> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
> at org.apache.spark.repl.SparkILoop.processLine$1(
> SparkILoop.scala:657)
> at org.apache.spark.repl.SparkILoop.innerLoop$1(
> SparkILoop.scala:665)
> at org.apache.spark.repl.SparkILoop.org$apache$spark$
> repl$SparkILoop$$loop(SparkILoop.scala:670)
> at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
> at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(
> ScalaClassLoader.scala:135)
> at org.apache.spark.repl.SparkILoop.org$apache$spark$
> repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(
> SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(
> SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.
> scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
> Data Loading failed for table carbon_table
> at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:54)
> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$
> anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.
> compute(NewCarbonDataLoadRDD.scala:322)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:227)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:
> 158)
> at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
> at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:43)
> ... 10 more
> AUDIT 27-03 02:27:21,453 - [hd21][storm][Thread-1]Data load is failed for
> default.carbon_table
> ERROR 27-03 02:27:21,453 - main
> java.lang.Exception: DataLoad failure: Data Loading failed for table
> carbon_table
> at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.
> loadCarbonData(CarbonDataRDDFactory.scala:937)
> at org.apache.spark.sql.execution.command.LoadTable.
> run(carbonTableSchema.scala:579)
> at org.apache.spark.sql.execution.command.LoadTableByInsert.run(
> carbonTableSchema.scala:297)
> at org.apache.spark.sql.execution.ExecutedCommand.
> sideEffectResult$lzycompute(commands.scala:58)
> at org.apache.spark.sql.execution.ExecutedCommand.
> sideEffectResult(commands.scala:56)
> at org.apache.spark.sql.execution.ExecutedCommand.
> doExecute(commands.scala:70)
> at org.apache.spark.sql.execution.SparkPlan$$anonfun$
> execute$5.apply(SparkPlan.scala:132)
> at org.apache.spark.sql.execution.SparkPlan$$anonfun$
> execute$5.apply(SparkPlan.scala:130)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:150)
> at org.apache.spark.sql.execution.SparkPlan.execute(
> SparkPlan.scala:130)
> at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(
> QueryExecution.scala:55)
> at org.apache.spark.sql.execution.QueryExecution.
> toRdd(QueryExecution.scala:55)
> at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
> at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
> at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>
> (<console>:31)
> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<
> console>:36)
> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>
> :38)
> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
> at $line23.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
> at $line23.$read$$iwC$$iwC$$iwC.<init>(<console>:44)
> at $line23.$read$$iwC$$iwC.<init>(<console>:46)
> at $line23.$read$$iwC.<init>(<console>:48)
> at $line23.$read.<init>(<console>:50)
> at $line23.$read$.<init>(<console>:54)
> at $line23.$read$.<clinit>(<console>)
> at $line23.$eval$.<init>(<console>:7)
> at $line23.$eval$.<clinit>(<console>)
> at $line23.$eval.$print(<console>)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(
> SparkIMain.scala:1065)
> at org.apache.spark.repl.SparkIMain$Request.loadAndRun(
> SparkIMain.scala:1346)
> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(
> SparkIMain.scala:840)
> at org.apache.spark.repl.SparkIMain.interpret(
> SparkIMain.scala:871)
> at org.apache.spark.repl.SparkIMain.interpret(
> SparkIMain.scala:819)
> at org.apache.spark.repl.SparkILoop.reallyInterpret$1(
> SparkILoop.scala:857)
> at org.apache.spark.repl.SparkILoop.interpretStartingWith(
> SparkILoop.scala:902)
> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
> at org.apache.spark.repl.SparkILoop.processLine$1(
> SparkILoop.scala:657)
> at org.apache.spark.repl.SparkILoop.innerLoop$1(
> SparkILoop.scala:665)
> at org.apache.spark.repl.SparkILoop.org$apache$spark$
> repl$SparkILoop$$loop(SparkILoop.scala:670)
> at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
> at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(
> ScalaClassLoader.scala:135)
> at org.apache.spark.repl.SparkILoop.org$apache$spark$
> repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(
> SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(
> SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.
> scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> AUDIT 27-03 02:27:21,454 - [hd21][storm][Thread-1]Dataload failure for
> default.carbon_table. Please check the logs
> java.lang.Exception: DataLoad failure: Data Loading failed for table
> carbon_table
> at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.
> loadCarbonData(CarbonDataRDDFactory.scala:937)
> at org.apache.spark.sql.execution.command.LoadTable.
> run(carbonTableSchema.scala:579)
> at org.apache.spark.sql.execution.command.LoadTableByInsert.run(
> carbonTableSchema.scala:297)
> at org.apache.spark.sql.execution.ExecutedCommand.
> sideEffectResult$lzycompute(commands.scala:58)
> at org.apache.spark.sql.execution.ExecutedCommand.
> sideEffectResult(commands.scala:56)
> at org.apache.spark.sql.execution.ExecutedCommand.
> doExecute(commands.scala:70)
> at org.apache.spark.sql.execution.SparkPlan$$anonfun$
> execute$5.apply(SparkPlan.scala:132)
> at org.apache.spark.sql.execution.SparkPlan$$anonfun$
> execute$5.apply(SparkPlan.scala:130)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:150)
> at org.apache.spark.sql.execution.SparkPlan.execute(
> SparkPlan.scala:130)
> at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(
> QueryExecution.scala:55)
> at org.apache.spark.sql.execution.QueryExecution.
> toRdd(QueryExecution.scala:55)
> at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
> at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
> at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
> at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
> at $iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
> at $iwC$$iwC$$iwC.<init>(<console>:44)
> at $iwC$$iwC.<init>(<console>:46)
> at $iwC.<init>(<console>:48)
> at <init>(<console>:50)
> at .<init>(<console>:54)
> at .<clinit>(<console>)
> at .<init>(<console>:7)
> at .<clinit>(<console>)
> at $print(<console>)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(
> SparkIMain.scala:1065)
> at org.apache.spark.repl.SparkIMain$Request.loadAndRun(
> SparkIMain.scala:1346)
> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(
> SparkIMain.scala:840)
> at org.apache.spark.repl.SparkIMain.interpret(
> SparkIMain.scala:871)
> at org.apache.spark.repl.SparkIMain.interpret(
> SparkIMain.scala:819)
> at org.apache.spark.repl.SparkILoop.reallyInterpret$1(
> SparkILoop.scala:857)
> at org.apache.spark.repl.SparkILoop.interpretStartingWith(
> SparkILoop.scala:902)
> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
> at org.apache.spark.repl.SparkILoop.processLine$1(
> SparkILoop.scala:657)
> at org.apache.spark.repl.SparkILoop.innerLoop$1(
> SparkILoop.scala:665)
> at org.apache.spark.repl.SparkILoop.org$apache$spark$
> repl$SparkILoop$$loop(SparkILoop.scala:670)
> at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
> at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(
> ScalaClassLoader.scala:135)
> at org.apache.spark.repl.SparkILoop.org$apache$spark$
> repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(
> SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(
> SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.
> scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> At 2017-03-27 00:42:28, "a" <
[hidden email]> wrote:
>
>
>
> Container log : error executor.CoarseGrainedExecutorBackend: RECEIVED
> SIGNAL 15: SIGTERM。
> spark log: 17/03/26 23:40:30 ERROR YarnScheduler: Lost executor 2 on
> hd25: Container killed by YARN for exceeding memory limits. 49.0 GB of 49
> GB physical memory used. Consider boosting spark.yarn.executor.
> memoryOverhead.
> The test sql
>
>
>
>
>
>
>
> At 2017-03-26 23:34:36, "a" <
[hidden email]> wrote:
> >
> >
> >I have set the parameters as follow:
> >1、fs.hdfs.impl.disable.cache=true
> >2、dfs.socket.timeout=1800000 (Exception:aused by: java.io.IOException:
> Filesystem closed)
> >3、dfs.datanode.socket.write.timeout=3600000
> >4、set carbondata property enable.unsafe.sort=true
> >5、remove BUCKETCOLUMNS property from the create table sql
> >6、set spark job parameter executor-memory=48G (from 20G to 48G)
> >
> >
> >But it still failed, the error is "executor.CoarseGrainedExecutorBackend:
> RECEIVED SIGNAL 15: SIGTERM。"
> >
> >
> >Then i try to insert 40000 0000 records into carbondata table ,it works
> success.
> >
> >
> >How can i insert 20 0000 0000 records into carbondata?
> >Should me set executor-memory big enough? Or Should me generate the csv
> file from the hive table first ,then load the csv file into carbon table?
> >Any body give me same help?
> >
> >
> >Regards
> >fish
> >
> >
> >
> >
> >
> >
> >
> >At 2017-03-26 00:34:18, "a" <
[hidden email]> wrote:
> >>Thank you Ravindra!
> >>Version:
> >>My carbondata version is 1.0,spark version is 1.6.3,hadoop version is
> 2.7.1,hive version is 1.1.0
> >>one of the containers log:
> >>17/03/25 22:07:09 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED
> SIGNAL 15: SIGTERM
> >>17/03/25 22:07:09 INFO storage.DiskBlockManager: Shutdown hook called
> >>17/03/25 22:07:09 INFO util.ShutdownHookManager: Shutdown hook called
> >>17/03/25 22:07:09 INFO util.ShutdownHookManager: Deleting directory
> /data1/hadoop/hd_space/tmp/nm-local-dir/usercache/storm/
> appcache/application_1490340325187_0042/spark-84b305f9-af7b-4f58-a809-
> 700345a84109
> >>17/03/25 22:07:10 ERROR impl.ParallelReadMergeSorterImpl:
> pool-23-thread-2
> >>java.io.IOException: Error reading file: hdfs://xxxx_table_tmp/dt=2017-
> 01-01/pt=ios/000006_0
> >> at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(
> RecordReaderImpl.java:1046)
> >> at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$
> OriginalReaderPair.next(OrcRawRecordMerger.java:263)
> >> at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.next(
> OrcRawRecordMerger.java:547)
> >> at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(
> OrcInputFormat.java:1234)
> >> at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(
> OrcInputFormat.java:1218)
> >> at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$
> NullKeyRecordReader.next(OrcInputFormat.java:1150)
> >> at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$
> NullKeyRecordReader.next(OrcInputFormat.java:1136)
> >> at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(
> HadoopRDD.scala:249)
> >> at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(
> HadoopRDD.scala:211)
> >> at org.apache.spark.util.NextIterator.hasNext(
> NextIterator.scala:73)
> >> at org.apache.spark.InterruptibleIterator.hasNext(
> InterruptibleIterator.scala:39)
> >> at scala.collection.Iterator$$anon$11.hasNext(Iterator.
> scala:327)
> >> at scala.collection.Iterator$$anon$11.hasNext(Iterator.
> scala:327)
> >> at scala.collection.Iterator$$anon$11.hasNext(Iterator.
> scala:327)
> >> at org.apache.carbondata.spark.rdd.NewRddIterator.hasNext(
> NewCarbonDataLoadRDD.scala:412)
> >> at org.apache.carbondata.processing.newflow.steps.
> InputProcessorStepImpl$InputProcessorIterator.internalHasNext(
> InputProcessorStepImpl.java:163)
> >> at org.apache.carbondata.processing.newflow.steps.
> InputProcessorStepImpl$InputProcessorIterator.getBatch(
> InputProcessorStepImpl.java:221)
> >> at org.apache.carbondata.processing.newflow.steps.
> InputProcessorStepImpl$InputProcessorIterator.next(
> InputProcessorStepImpl.java:183)
> >> at org.apache.carbondata.processing.newflow.steps.
> InputProcessorStepImpl$InputProcessorIterator.next(
> InputProcessorStepImpl.java:117)
> >> at org.apache.carbondata.processing.newflow.steps.
> DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl
> .java:80)
> >> at org.apache.carbondata.processing.newflow.steps.
> DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl
> .java:73)
> >> at org.apache.carbondata.processing.newflow.sort.impl.
> ParallelReadMergeSorterImpl$SortIteratorThread.call(
> ParallelReadMergeSorterImpl.java:196)
> >> at org.apache.carbondata.processing.newflow.sort.impl.
> ParallelReadMergeSorterImpl$SortIteratorThread.call(
> ParallelReadMergeSorterImpl.java:177)
> >> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> >> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> >> at java.lang.Thread.run(Thread.java:745)
> >>Caused by: java.io.IOException: Filesystem closed
> >> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.
> java:808)
> >> at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(
> DFSInputStream.java:868)
> >> at org.apache.hadoop.hdfs.DFSInputStream.read(
> DFSInputStream.java:934)
> >> at java.io.DataInputStream.readFully(DataInputStream.java:195)
> >> at org.apache.hadoop.hive.ql.io.orc.MetadataReader.
> readStripeFooter(MetadataReader.java:112)
> >> at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.
> readStripeFooter(RecordReaderImpl.java:228)
> >> at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.
> beginReadStripe(RecordReaderImpl.java:805)
> >> at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.
> readStripe(RecordReaderImpl.java:776)
> >> at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.
> advanceStripe(RecordReaderImpl.java:986)
> >> at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.
> advanceToNextRow(RecordReaderImpl.java:1019)
> >> at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(
> RecordReaderImpl.java:1042)
> >> ... 26 more
> >>I will try to set enable.unsafe.sort=true and remove BUCKETCOLUMNS
> property ,and try again.
> >>
> >>
> >>At 2017-03-25 20:55:03, "Ravindra Pesala" <
[hidden email]> wrote:
> >>>Hi,
> >>>
> >>>Carbodata launches one job per each node to sort the data at node level
> and
> >>>avoid shuffling. Internally it uses threads to use parallel load. Please
> >>>use carbon.number.of.cores.while.loading property in carbon.properties
> file
> >>>and set the number of cores it should use per machine while loading.
> >>>Carbondata sorts the data at each node level to maintain the Btree for
> >>>each node per segment. It improves the query performance by filtering
> >>>faster if we have Btree at node level instead of each block level.
> >>>
> >>>1.Which version of Carbondata are you using?
> >>>2.There are memory issues in Carbondata-1.0 version and are fixed
> current
> >>>master.
> >>>3.And you can improve the performance by enabling
> enable.unsafe.sort=true in
> >>>carbon.properties file. But it is not supported if bucketing of columns
> are
> >>>enabled. We are planning to support unsafe sort load for bucketing also
> in
> >>>next version.
> >>>
> >>>Please send the executor log to know about the error you are facing.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>Regards,
> >>>Ravindra
> >>>
> >>>On 25 March 2017 at 16:18,
[hidden email] <
[hidden email]> wrote:
> >>>
> >>>> Hello!
> >>>>
> >>>> *0、The failure*
> >>>> When i insert into carbon table,i encounter failure。The failure is as
> >>>> follow:
> >>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times,
> most
> >>>> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
> >>>> ExecutorLostFailure (executor 1 exited caused by one of the running
> tasks)
> >>>> Reason: Slave lost+details
> >>>>
> >>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times,
> most recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
> ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
> Reason: Slave lost
> >>>> Driver stacktrace:
> >>>>
> >>>> the stage:
> >>>>
> >>>> *Step:*
> >>>> *1、start spark-shell*
> >>>> ./bin/spark-shell \
> >>>> --master yarn-client \
> >>>> --num-executors 5 \ (I tried to set this parameter range from 10 to
> >>>> 20,but the second job has only 5 tasks)
> >>>> --executor-cores 5 \
> >>>> --executor-memory 20G \
> >>>> --driver-memory 8G \
> >>>> --queue root.default \
> >>>> --jars /xxx.jar
> >>>>
> >>>> //spark-default.conf spark.default.parallelism=320
> >>>>
> >>>> import org.apache.spark.sql.CarbonContext
> >>>> val cc = new CarbonContext(sc, "hdfs://xxxx/carbonData/CarbonStore")
> >>>>
> >>>> *2、create table*
> >>>> cc.sql("CREATE TABLE IF NOT EXISTS xxxx_table (dt String,pt String,lst
> >>>> String,plat String,sty String,is_pay String,is_vip String,is_mpack
> >>>> String,scene String,status String,nw String,isc String,area
> String,spttag
> >>>> String,province String,isp String,city String,tv String,hwm String,pip
> >>>> String,fo String,sh String,mid String,user_id String,play_pv
> Int,spt_cnt
> >>>> Int,prg_spt_cnt Int) row format delimited fields terminated by '|'
> STORED
> >>>> BY 'carbondata' TBLPROPERTIES ('DICTIONARY_EXCLUDE'='pip,sh,
> >>>> mid,fo,user_id','DICTIONARY_INCLUDE'='dt,pt,lst,plat,sty,
> >>>> is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,
> >>>> province,isp,city,tv,hwm','NO_INVERTED_INDEX'='lst,plat,hwm,
> >>>> pip,sh,mid','BUCKETNUMBER'='10','BUCKETCOLUMNS'='fo')")
> >>>>
> >>>> //notes,set "fo" column BUCKETCOLUMNS is to join another table
> >>>> //the column distinct values are as follows:
> >>>>
> >>>>
> >>>> *3、insert into table*(xxxx_table_tmp is a hive extenal orc table,has
> 20
> >>>> 0000 0000 records)
> >>>> cc.sql("insert into xxxx_table select dt,pt,lst,plat,sty,is_pay,is_
> >>>> vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,
> >>>> city,tv,hwm,pip,fo,sh,mid,user_id ,play_pv,spt_cnt,prg_spt_cnt from
> >>>> xxxx_table_tmp where dt='2017-01-01'")
> >>>>
> >>>> *4、spark split sql into two jobs,the first finished succeeded, but the
> >>>> second failed:*
> >>>>
> >>>>
> >>>> *5、The second job stage:*
> >>>>
> >>>>
> >>>>
> >>>> *Question:*
> >>>> 1、Why the second job has only five jobs,but the first job has 994
> jobs ?(
> >>>> note:My hadoop cluster has 5 datanode)
> >>>> I guess it caused the failure
> >>>> 2、In the sources,i find DataLoadPartitionCoalescer.class,is it means
> that
> >>>> "one datanode has only one partition ,and then the task is only one
> on the
> >>>> datanode"?
> >>>> 3、In the ExampleUtils class,"carbon.table.split.partition.enable" is
> set
> >>>> as follow,but i can not find "carbon.table.split.partition.enable" in
> >>>> other parts of the project。
> >>>> I set "carbon.table.split.partition.enable" to true, but the
> second
> >>>> job has only five jobs.How to use this property?
> >>>> ExampleUtils :
> >>>> // whether use table split partition
> >>>> // true -> use table split partition, support multiple partition
> >>>> loading
> >>>> // false -> use node split partition, support data load by host
> >>>> partition
> >>>> CarbonProperties.getInstance().addProperty("carbon.table.
> split.partition.enable",
> >>>> "false")
> >>>> 4、Insert into carbon table takes 3 hours ,but eventually failed 。How
> can
> >>>> i speed it.
> >>>> 5、in the spark-shell ,I tried to set this parameter range from 10 to
> >>>> 20,but the second job has only 5 tasks
> >>>> the other parameter executor-memory = 20G is enough?
> >>>>
> >>>> I need your help!Thank you very much!
> >>>>
> >>>>
[hidden email]
> >>>>
> >>>> ------------------------------
> >>>>
[hidden email]
> >>>>
> >>>
> >>>
> >>>
> >>>--
> >>>Thanks & Regards,
> >>>Ravi
>