here the code: ``` val df = func(rdd) ``` and the configurations: ··· { ··· |
Here is the error:
ERROR 25-11 18:13:40,116 - Data loading failed. table not found: default.carbon1 AUDIT 25-11 18:13:40,118 - [allwefantasy][allwefantasy][Thread-98]Data loading failed. table not found: default.carbon1 INFO 25-11 18:13:40,119 - Finished job streaming job 1480068820000 ms.0 from job set of time 1480068820000 ms INFO 25-11 18:13:40,119 - Total delay: 0.119 s for time 1480068820000 ms (execution: 0.106 s) INFO 25-11 18:13:40,120 - Removing RDD 4 from persistence list java.lang.RuntimeException: Data loading failed. table not found: default.carbon1 at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1040) at org.apache.carbondata.spark.CarbonDataFrameWriter.loadDataFrame(CarbonDataFrameWriter.scala:132) at org.apache.carbondata.spark.CarbonDataFrameWriter.writeToCarbonFile(CarbonDataFrameWriter.scala:52) at org.apache.carbondata.spark.CarbonDataFrameWriter.appendToCarbonFile(CarbonDataFrameWriter.scala:43) at org.apache.spark.sql.CarbonSource.createRelation(CarbonDatasourceRelation.scala:112) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148) at streaming.core.compositor.spark.streaming.output.SQLOutputCompositor$$anonfun$result$1.apply(SQLOutputCompositor.scala:61) at streaming.core.compositor.spark.streaming.output.SQLOutputCompositor$$anonfun$result$1.apply(SQLOutputCompositor.scala:53) at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49) at scala.util.Try$.apply(Try.scala:161) at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:224) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:223) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) |
In reply to this post by ZhuWilliam
When I change the SaveMode.Append to Override,then the error is more weird:
INFO 25-11 20:19:46,572 - streaming-job-executor-0 Query [ CREATE TABLE IF NOT EXISTS DEFAULT.CARBON2 (A STRING, B STRING) STORED BY 'ORG.APACHE.CARBONDATA.FORMAT' ] INFO 25-11 20:19:46,656 - Parsing command: CREATE TABLE IF NOT EXISTS default.carbon2 (a STRING, b STRING) STORED BY 'org.apache.carbondata.format' INFO 25-11 20:19:46,663 - Parse Completed AUDIT 25-11 20:19:46,860 - [allwefantasy][allwefantasy][Thread-100]Creating Table with Database name [default] and Table name [carbon2] INFO 25-11 20:19:46,889 - 1: get_tables: db=default pat=.* INFO 25-11 20:19:46,889 - ugi=allwefantasy ip=unknown-ip-addr cmd=get_tables: db=default pat=.* INFO 25-11 20:19:46,889 - 1: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore INFO 25-11 20:19:46,891 - ObjectStore, initialize called INFO 25-11 20:19:46,897 - Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing INFO 25-11 20:19:46,898 - Using direct SQL, underlying DB is MYSQL INFO 25-11 20:19:46,898 - Initialized ObjectStore INFO 25-11 20:19:46,954 - streaming-job-executor-0 Table block size not specified for default_carbon2. Therefore considering the default value 1024 MB INFO 25-11 20:19:46,978 - Table carbon2 for Database default created successfully. INFO 25-11 20:19:46,978 - streaming-job-executor-0 Table carbon2 for Database default created successfully. AUDIT 25-11 20:19:46,978 - [allwefantasy][allwefantasy][Thread-100]Creating timestamp file for default.carbon2 INFO 25-11 20:19:46,979 - streaming-job-executor-0 Query [CREATE TABLE DEFAULT.CARBON2 USING CARBONDATA OPTIONS (TABLENAME "DEFAULT.CARBON2", TABLEPATH "FILE:///TMP/CARBONDATA/STORE/DEFAULT/CARBON2") ] INFO 25-11 20:19:47,033 - 1: get_table : db=default tbl=carbon2 INFO 25-11 20:19:47,034 - ugi=allwefantasy ip=unknown-ip-addr cmd=get_table : db=default tbl=carbon2 WARN 25-11 20:19:47,062 - Couldn't find corresponding Hive SerDe for data source provider carbondata. Persisting data source relation `default`.`carbon2` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. INFO 25-11 20:19:47,247 - 1: create_table: Table(tableName:carbon2, dbName:default, owner:allwefantasy, createTime:1480076387, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:array<string>, comment:from deserializer)], location:null, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe, parameters:{tableName=default.carbon2, serialization.format=1, tablePath=file:///tmp/carbondata/store/default/carbon2}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], parameters:{EXTERNAL=TRUE, spark.sql.sources.provider=carbondata}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, rolePrivileges:null)) INFO 25-11 20:19:47,247 - ugi=allwefantasy ip=unknown-ip-addr cmd=create_table: Table(tableName:carbon2, dbName:default, owner:allwefantasy, createTime:1480076387, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:array<string>, comment:from deserializer)], location:null, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe, parameters:{tableName=default.carbon2, serialization.format=1, tablePath=file:///tmp/carbondata/store/default/carbon2}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], parameters:{EXTERNAL=TRUE, spark.sql.sources.provider=carbondata}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, rolePrivileges:null)) INFO 25-11 20:19:47,257 - Creating directory if it doesn't exist: file:/tmp/user/hive/warehouse/carbon2 AUDIT 25-11 20:19:47,564 - [allwefantasy][allwefantasy][Thread-100]Table created with Database name [default] and Table name [carbon2] org.apache.spark.sql.catalyst.analysis.NoSuchTableException at org.apache.spark.sql.hive.CarbonMetastoreCatalog.lookupRelation1(CarbonMetastoreCatalog.scala:141) at org.apache.spark.sql.hive.CarbonMetastoreCatalog.lookupRelation1(CarbonMetastoreCatalog.scala:127) at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1044) at org.apache.carbondata.spark.CarbonDataFrameWriter.loadDataFrame(CarbonDataFrameWriter.scala:132) at org.apache.carbondata.spark.CarbonDataFrameWriter.writeToCarbonFile(CarbonDataFrameWriter.scala:52) at org.apache.carbondata.spark.CarbonDataFrameWriter.saveAsCarbonFile(CarbonDataFrameWriter.scala:37) at org.apache.spark.sql.CarbonSource.createRelation(CarbonDatasourceRelation.scala:110) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148) at streaming.core.compositor.spark.streaming.output.SQLOutputCompositor$$anonfun$result$1.apply(SQLOutputCompositor.scala:60) at streaming.core.compositor.spark.streaming.output.SQLOutputCompositor$$anonfun$result$1.apply(SQLOutputCompositor.scala:53) at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49) at scala.util.Try$.apply(Try.scala:161) at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:224) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:223) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) AUDIT 25-11 20:19:47,584 - [allwefantasy][allwefantasy][Thread-100]Table Not Found: carbon2 INFO 25-11 20:19:47,588 - Finished job streaming job 1480076380000 ms.0 from job set of time 1480076380000 ms INFO 25-11 20:19:47,590 - Total delay: 7.586 s for time 1480076380000 ms (execution: 7.547 s) |
Hi,
In Append mode , the carbon table supposed to be created before other wise load fails as Table do not exist. In Overwrite mode the carbon table would be created (it drops if it already exists) and loads the data. But in your case for overwrite mode it creates the table but it says table not found while loading. Can you provide script to reproduce this issue and also provide the carbondata and spark version you are using. Regards, Ravindra. On 25 November 2016 at 17:58, ZhuWilliam <[hidden email]> wrote: > When I change the SaveMode.Append to Override,then the error is more weird: > > > > INFO 25-11 20:19:46,572 - streaming-job-executor-0 Query [ > CREATE TABLE IF NOT EXISTS DEFAULT.CARBON2 > (A STRING, B STRING) > STORED BY 'ORG.APACHE.CARBONDATA.FORMAT' > ] > INFO 25-11 20:19:46,656 - Parsing command: > CREATE TABLE IF NOT EXISTS default.carbon2 > (a STRING, b STRING) > STORED BY 'org.apache.carbondata.format' > > INFO 25-11 20:19:46,663 - Parse Completed > AUDIT 25-11 20:19:46,860 - [allwefantasy][allwefantasy][ > Thread-100]Creating > Table with Database name [default] and Table name [carbon2] > INFO 25-11 20:19:46,889 - 1: get_tables: db=default pat=.* > INFO 25-11 20:19:46,889 - ugi=allwefantasy ip=unknown-ip-addr > cmd=get_tables: db=default pat=.* > INFO 25-11 20:19:46,889 - 1: Opening raw store with implemenation > class:org.apache.hadoop.hive.metastore.ObjectStore > INFO 25-11 20:19:46,891 - ObjectStore, initialize called > INFO 25-11 20:19:46,897 - Reading in results for query > "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used > is > closing > INFO 25-11 20:19:46,898 - Using direct SQL, underlying DB is MYSQL > INFO 25-11 20:19:46,898 - Initialized ObjectStore > INFO 25-11 20:19:46,954 - streaming-job-executor-0 Table block size not > specified for default_carbon2. Therefore considering the default value 1024 > MB > INFO 25-11 20:19:46,978 - Table carbon2 for Database default created > successfully. > INFO 25-11 20:19:46,978 - streaming-job-executor-0 Table carbon2 for > Database default created successfully. > AUDIT 25-11 20:19:46,978 - [allwefantasy][allwefantasy][ > Thread-100]Creating > timestamp file for default.carbon2 > INFO 25-11 20:19:46,979 - streaming-job-executor-0 Query [CREATE TABLE > DEFAULT.CARBON2 USING CARBONDATA OPTIONS (TABLENAME "DEFAULT.CARBON2", > TABLEPATH "FILE:///TMP/CARBONDATA/STORE/DEFAULT/CARBON2") ] > INFO 25-11 20:19:47,033 - 1: get_table : db=default tbl=carbon2 > INFO 25-11 20:19:47,034 - ugi=allwefantasy ip=unknown-ip-addr > cmd=get_table > : db=default tbl=carbon2 > WARN 25-11 20:19:47,062 - Couldn't find corresponding Hive SerDe for data > source provider carbondata. Persisting data source relation > `default`.`carbon2` into Hive metastore in Spark SQL specific format, which > is NOT compatible with Hive. > INFO 25-11 20:19:47,247 - 1: create_table: Table(tableName:carbon2, > dbName:default, owner:allwefantasy, createTime:1480076387, > lastAccessTime:0, > retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, > type:array<string>, comment:from deserializer)], location:null, > inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, > compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.serde2. > MetadataTypedColumnsetSerDe, > parameters:{tableName=default.carbon2, serialization.format=1, > tablePath=file:///tmp/carbondata/store/default/carbon2}), bucketCols:[], > sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], > skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], > parameters:{EXTERNAL=TRUE, spark.sql.sources.provider=carbondata}, > viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, > privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, > rolePrivileges:null)) > INFO 25-11 20:19:47,247 - ugi=allwefantasy ip=unknown-ip-addr > cmd=create_table: Table(tableName:carbon2, dbName:default, > owner:allwefantasy, createTime:1480076387, lastAccessTime:0, retention:0, > sd:StorageDescriptor(cols:[FieldSchema(name:col, type:array<string>, > comment:from deserializer)], location:null, > inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, > compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.serde2. > MetadataTypedColumnsetSerDe, > parameters:{tableName=default.carbon2, serialization.format=1, > tablePath=file:///tmp/carbondata/store/default/carbon2}), bucketCols:[], > sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], > skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], > parameters:{EXTERNAL=TRUE, spark.sql.sources.provider=carbondata}, > viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, > privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, > rolePrivileges:null)) > INFO 25-11 20:19:47,257 - Creating directory if it doesn't exist: > file:/tmp/user/hive/warehouse/carbon2 > AUDIT 25-11 20:19:47,564 - [allwefantasy][allwefantasy][Thread-100]Table > created with Database name [default] and Table name [carbon2] > org.apache.spark.sql.catalyst.analysis.NoSuchTableException > at > org.apache.spark.sql.hive.CarbonMetastoreCatalog.lookupRelation1( > CarbonMetastoreCatalog.scala:141) > at > org.apache.spark.sql.hive.CarbonMetastoreCatalog.lookupRelation1( > CarbonMetastoreCatalog.scala:127) > at > org.apache.spark.sql.execution.command.LoadTable. > run(carbonTableSchema.scala:1044) > at > org.apache.carbondata.spark.CarbonDataFrameWriter.loadDataFrame( > CarbonDataFrameWriter.scala:132) > at > org.apache.carbondata.spark.CarbonDataFrameWriter.writeToCarbonFile( > CarbonDataFrameWriter.scala:52) > at > org.apache.carbondata.spark.CarbonDataFrameWriter.saveAsCarbonFile( > CarbonDataFrameWriter.scala:37) > at > org.apache.spark.sql.CarbonSource.createRelation(CarbonDatasourceRelation. > scala:110) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply( > ResolvedDataSource.scala:222) > at org.apache.spark.sql.DataFrameWriter.save( > DataFrameWriter.scala:148) > at > streaming.core.compositor.spark.streaming.output. > SQLOutputCompositor$$anonfun$result$1.apply(SQLOutputCompositor.scala:60) > at > streaming.core.compositor.spark.streaming.output. > SQLOutputCompositor$$anonfun$result$1.apply(SQLOutputCompositor.scala:53) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$ > foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$ > foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) > at > org.apache.spark.streaming.dstream.ForEachDStream$$ > anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50) > at > org.apache.spark.streaming.dstream.ForEachDStream$$ > anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) > at > org.apache.spark.streaming.dstream.ForEachDStream$$ > anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) > at > org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties( > DStream.scala:426) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp( > ForEachDStream.scala:49) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply( > ForEachDStream.scala:49) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply( > ForEachDStream.scala:49) > at scala.util.Try$.apply(Try.scala:161) > at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39) > at > org.apache.spark.streaming.scheduler.JobScheduler$ > JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:224) > at > org.apache.spark.streaming.scheduler.JobScheduler$ > JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224) > at > org.apache.spark.streaming.scheduler.JobScheduler$ > JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > at > org.apache.spark.streaming.scheduler.JobScheduler$ > JobHandler.run(JobScheduler.scala:223) > at > java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > AUDIT 25-11 20:19:47,584 - [allwefantasy][allwefantasy][Thread-100]Table > Not > Found: carbon2 > INFO 25-11 20:19:47,588 - Finished job streaming job 1480076380000 ms.0 > from job set of time 1480076380000 ms > INFO 25-11 20:19:47,590 - Total delay: 7.586 s for time 1480076380000 ms > (execution: 7.547 s) > > > > -- > View this message in context: http://apache-carbondata- > mailing-list-archive.1130556.n5.nabble.com/Using-DataFrame- > to-write-carbondata-file-cause-no-table-found-error-tp3203p3212.html > Sent from the Apache CarbonData Mailing List archive mailing list archive > at Nabble.com. > -- Thanks & Regards, Ravi |
Free forum by Nabble | Edit this page |