Hi all,
On the execution of the following query : LOAD DATA inpath 'hdfs://localhost:54310/csv/test.csv' INTO table employee options('DELIMITER'=',', 'FILEHEADER'='id, firstname'); the table schema is a following : -----------------------------+ col_name data_type comment -----------------------------+ id bigint firstname string -----------------------------+ The load gets successful at times but we also end up often with the following error : Dictionary file is locked for Updation. Following below are the logs : AUDIT 02-01 18:17:07,009 - [knoldus][pallavi][Thread-110]Dataload failure for default.employee. Please check the logs INFO 02-01 18:17:07,020 - pool-30-thread-1 Successfully deleted the lock file /tmp/default/employee/meta.lock INFO 02-01 18:17:07,022 - Table MetaData Unlocked Successfully after data load ERROR 02-01 18:17:07,022 - Error executing query, currentState RUNNING, org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, 192.168.2.188): java.lang.RuntimeException: Dictionary file firstname is locked for updation. Please try after some time at scala.sys.package$.error(package.scala:27) at org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD$$anon$1.<init>(CarbonGlobalDictionaryRDD.scala:364) at org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD.compute(CarbonGlobalDictionaryRDD.scala:302) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.collect(RDD.scala:926) at org.apache.carbondata.spark.util.GlobalDictionaryUtil$.generateGlobalDictionary(GlobalDictionaryUtil.scala:769) -- Regards | Pallavi Singh Software Consultant Knoldus Software LLP [hidden email] +91-9911235949 |
Hello Team,
I am worked on the following scenario, - Created a hive table *hivetest*, - Created a carbon table *employeedemo*, - Loaded some records into hive table with load query, - Executed the insert query on carbon table to load the records from the hive table as below, *insert into table employeedemo select * from hivetest;* the above query execution resulted in the following exception, 0: jdbc:hive2://hadoop-master:10000> *insert into table employeedemo select * from hivetest;* Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 8, hadoop-slave-2): java.lang.RuntimeException: Dictionary file name is locked for updation. Please try after some time at scala.sys.package$.error(package.scala:27) at org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD$$anon$1.<init>(CarbonGlobalDictionaryRDD.scala:364) at org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD.compute(CarbonGlobalDictionaryRDD.scala:302) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) The above error keeps appears 1 out of 3 times of the execution of the query. The same error occurs when loading the records from a carbon table into another carbon table with insert query. Thank You Best Regards | *Harsh Sharma* Sr. Software Consultant Facebook <https://www.facebook.com/harsh.sharma.161446> | Twitter <https://twitter.com/harsh_sharma5> | Linked In <https://www.linkedin.com/in/harsh-sharma-0a08a1b0?trk=hp-identity-name> [hidden email] Skype*: khandal60* *+91-8447307237* On Tue, Jan 3, 2017 at 12:02 PM, Pallavi Singh <[hidden email]> wrote: > Hi all, > > On the execution of the following query : > LOAD DATA inpath 'hdfs://localhost:54310/csv/test.csv' INTO table employee > options('DELIMITER'=',', 'FILEHEADER'='id, firstname'); > > the table schema is a following : > > -----------------------------+ > col_name data_type comment > > -----------------------------+ > id bigint > firstname string > > -----------------------------+ > > The load gets successful at times but we also end up often with the > following error : > Dictionary file is locked for Updation. > > Following below are the logs : > > AUDIT 02-01 18:17:07,009 - [knoldus][pallavi][Thread-110]Dataload failure > for default.employee. Please check the logs > INFO 02-01 18:17:07,020 - pool-30-thread-1 Successfully deleted the lock > file /tmp/default/employee/meta.lock > INFO 02-01 18:17:07,022 - Table MetaData Unlocked Successfully after data > load > ERROR 02-01 18:17:07,022 - Error executing query, currentState RUNNING, > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage > 2.0 (TID 5, 192.168.2.188): java.lang.RuntimeException: Dictionary file > firstname is locked for updation. Please try after some time > at scala.sys.package$.error(package.scala:27) > at > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate > RDD$$anon$1.<init>(CarbonGlobalDictionaryRDD.scala:364) > at > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD.compute( > CarbonGlobalDictionaryRDD.scala:302) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) > at > java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > Driver stacktrace: > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages( > DAGScheduler.scala:1431) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( > DAGScheduler.scala:1419) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( > DAGScheduler.scala:1418) > at > scala.collection.mutable.ResizableArray$class.foreach( > ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage( > DAGScheduler.scala:1418) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$ > handleTaskSetFailed$1.apply(DAGScheduler.scala:799) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$ > handleTaskSetFailed$1.apply(DAGScheduler.scala:799) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed( > DAGScheduler.scala:799) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop. > doOnReceive(DAGScheduler.scala:1640) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop. > onReceive(DAGScheduler.scala:1599) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop. > onReceive(DAGScheduler.scala:1588) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929) > at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927) > at > org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:150) > at > org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:111) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) > at org.apache.spark.rdd.RDD.collect(RDD.scala:926) > at > org.apache.carbondata.spark.util.GlobalDictionaryUtil$. > generateGlobalDictionary(GlobalDictionaryUtil.scala:769) > > -- > Regards | Pallavi Singh > Software Consultant > Knoldus Software LLP > [hidden email] > +91-9911235949 > |
I think you can have a look this maillist.
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Dictionary-file-is-locked-for-updation-td5076.html Have a look the following guide and pay attention to carbon.properties file. https://cwiki.apache.org/confluence/display/CARBONDATA/Cluster+deployment+guide For spark yarn cluster mode, 1. both driver side and executor side need same carbon.properties file. 2. set carbon.lock.type=HDFSLOCK 3. set carbon.properties.filepath spark.executor.extraJavaOptions -Dcarbon.properties.filepath=<absolute path carbon.properties> spark.driver.extraJavaOptions -Dcarbon.properties.filepath=<absolute path carbon.properties>
Best Regards
David Cai |
Free forum by Nabble | Edit this page |