Hi,
I use Spark Streaming consuming Kafka. When I save dataframe to carbon table , why the log showing "LOAD DATA INPATH './TEMPCSV' " ? Following is the code. result.show() result.write .format("carbondata") .option("tableName","sale") .mode(SaveMode.Append) .save() Following is the detail. Thanks. INFO 14-12 10:25:00,190 - Starting job streaming job 1481682300000 ms.0 from job set of time 1481682300000 ms INFO 14-12 10:25:00,244 - streaming-job-executor-0 Property file path: /data08/hadoop/yarn/local/usercache/hdfs/appcache/application_1481679069818_0015/container_e16_1481679069818_0015_01_000001/../../../conf/carbon.properties INFO 14-12 10:25:00,245 - streaming-job-executor-0 ------Using Carbon.properties -------- INFO 14-12 10:25:00,245 - streaming-job-executor-0 {} INFO 14-12 10:25:00,475 - streaming-job-executor-0 Table block size not specified for default_carbontest. Therefore considering the default value 1024 MB INFO 14-12 10:25:00,669 - streaming-job-executor-0 Table block size not specified for default_default_table. Therefore considering the default value 1024 MB INFO 14-12 10:25:00,682 - streaming-job-executor-0 Table block size not specified for default_sale. Therefore considering the default value 1024 MB INFO 14-12 10:25:01,010 - streaming-job-executor-0 Query [SELECT * FROM SALE_TMP] INFO 14-12 10:25:01,294 - Parsing command: select * from sale_tmp INFO 14-12 10:25:02,020 - Parse Completed INFO 14-12 10:25:02,037 - Parsing command: select * from sale_tmp INFO 14-12 10:25:02,038 - Parse Completed INFO 14-12 10:25:02,319 - Starting job: show at SaleStoreApp.scala:33 INFO 14-12 10:25:02,331 - Got job 0 (show at SaleStoreApp.scala:33) with 1 output partitions INFO 14-12 10:25:02,332 - Final stage: ResultStage 0 (show at SaleStoreApp.scala:33) INFO 14-12 10:25:02,332 - Parents of final stage: List() INFO 14-12 10:25:02,334 - Missing parents: List() INFO 14-12 10:25:02,340 - Submitting ResultStage 0 (MapPartitionsRDD[8] at show at SaleStoreApp.scala:33), which has no missing parents INFO 14-12 10:25:02,403 - Block broadcast_0 stored as values in memory (estimated size 10.4 KB, free 10.4 KB) INFO 14-12 10:25:02,412 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 4.9 KB, free 15.3 KB) INFO 14-12 10:25:02,413 - Added broadcast_0_piece0 in memory on 192.168.9.4:49953 (size: 4.9 KB, free: 1823.2 MB) INFO 14-12 10:25:02,415 - Created broadcast 0 from broadcast at DAGScheduler.scala:1008 INFO 14-12 10:25:02,419 - Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[8] at show at SaleStoreApp.scala:33) INFO 14-12 10:25:02,420 - Adding task set 0.0 with 1 tasks INFO 14-12 10:25:02,444 - Starting task 0.0 in stage 0.0 (TID 0, dpnode05, partition 0,NODE_LOCAL, 2012 bytes) INFO 14-12 10:25:02,912 - Added broadcast_0_piece0 in memory on dpnode05:35812 (size: 4.9 KB, free: 3.4 GB) INFO 14-12 10:25:04,648 - Finished task 0.0 in stage 0.0 (TID 0) in 2213 ms on dpnode05 (1/1) INFO 14-12 10:25:04,652 - Removed TaskSet 0.0, whose tasks have all completed, from pool INFO 14-12 10:25:04,656 - ResultStage 0 (show at SaleStoreApp.scala:33) finished in 2.224 s INFO 14-12 10:25:04,664 - Job 0 finished: show at SaleStoreApp.scala:33, took 2.344778 s INFO 14-12 10:25:04,678 - Starting job: show at SaleStoreApp.scala:33 INFO 14-12 10:25:04,681 - Got job 1 (show at SaleStoreApp.scala:33) with 2 output partitions INFO 14-12 10:25:04,681 - Final stage: ResultStage 1 (show at SaleStoreApp.scala:33) INFO 14-12 10:25:04,681 - Parents of final stage: List() INFO 14-12 10:25:04,682 - Missing parents: List() INFO 14-12 10:25:04,683 - Submitting ResultStage 1 (MapPartitionsRDD[8] at show at SaleStoreApp.scala:33), which has no missing parents INFO 14-12 10:25:04,695 - Block broadcast_1 stored as values in memory (estimated size 10.4 KB, free 25.7 KB) INFO 14-12 10:25:04,698 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 4.9 KB, free 30.5 KB) INFO 14-12 10:25:04,700 - Added broadcast_1_piece0 in memory on 192.168.9.4:49953 (size: 4.9 KB, free: 1823.2 MB) INFO 14-12 10:25:04,701 - Created broadcast 1 from broadcast at DAGScheduler.scala:1008 INFO 14-12 10:25:04,702 - Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[8] at show at SaleStoreApp.scala:33) INFO 14-12 10:25:04,702 - Adding task set 1.0 with 2 tasks INFO 14-12 10:25:04,706 - Starting task 1.0 in stage 1.0 (TID 1, dpnode03, partition 2,NODE_LOCAL, 2012 bytes) INFO 14-12 10:25:04,708 - Starting task 0.0 in stage 1.0 (TID 2, dpnode04, partition 1,NODE_LOCAL, 2012 bytes) INFO 14-12 10:25:04,949 - Added broadcast_1_piece0 in memory on dpnode03:36367 (size: 4.9 KB, free: 3.4 GB) INFO 14-12 10:25:05,159 - Added broadcast_1_piece0 in memory on dpnode04:41109 (size: 4.9 KB, free: 3.4 GB) INFO 14-12 10:25:06,587 - Finished task 1.0 in stage 1.0 (TID 1) in 1881 ms on dpnode03 (1/2) INFO 14-12 10:25:06,868 - ResultStage 1 (show at SaleStoreApp.scala:33) finished in 2.164 s INFO 14-12 10:25:06,868 - Finished task 0.0 in stage 1.0 (TID 2) in 2160 ms on dpnode04 (2/2) INFO 14-12 10:25:06,868 - Job 1 finished: show at SaleStoreApp.scala:33, took 2.189452 s INFO 14-12 10:25:06,868 - Removed TaskSet 1.0, whose tasks have all completed, from pool +--------------------+--------------------+------------+-------------+-------+-------------+-------------+-------+-----------+------------+------------+---------+---------+--------+ | id| store_id| order_code|checkout_date|saleamt|invoice_price|bill_sale_off|giveamt|member_code|cashier_code|cashier_name|item_cont|give_type|saletype| +--------------------+--------------------+------------+-------------+-------+-------------+-------------+-------+-----------+------------+------------+---------+---------+--------+ |f34d3a92-d3d8-41d...|d166fd97-93dd-414...|XS1608250002|1472116276000| 4.0| 4.0| 0.0| 0.0| | 00000001| admin| 1.0| 0.0| 1| |26399c17-858c-4a8...|d166fd97-93dd-414...|XS1608250003|1472116308000| 9.0| 9.0| 0.0| 0.0| | 00000001| admin| 2.0| 0.0| 1| |1cbe7ac2-d754-464...|d166fd97-93dd-414...|XS1608250001|1472116093000| 12.5| 12.5| 0.0| 0.0| | 00000001| admin| 2.0| 0.0| 1| |1532bde2-ae14-404...|d166fd97-93dd-414...|XS1608250004|1472116363000| 360.0| 360.0| 0.0| 0.0| | 00000001| admin| 3.0| 0.0| 1| +--------------------+--------------------+------------+-------------+-------+-------------+-------------+-------+-----------+------------+------------+---------+---------+--------+ INFO 14-12 10:25:06,971 - mapred.tip.id is deprecated. Instead, use mapreduce.task.id INFO 14-12 10:25:06,971 - mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id INFO 14-12 10:25:06,971 - mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap INFO 14-12 10:25:06,972 - mapred.task.partition is deprecated. Instead, use mapreduce.task.partition INFO 14-12 10:25:06,972 - mapred.job.id is deprecated. Instead, use mapreduce.job.id INFO 14-12 10:25:06,975 - File Output Committer Algorithm version is 2 INFO 14-12 10:25:06,975 - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false INFO 14-12 10:25:07,129 - Starting job: saveAsTextFile at package.scala:169 INFO 14-12 10:25:07,130 - Got job 2 (saveAsTextFile at package.scala:169) with 3 output partitions INFO 14-12 10:25:07,130 - Final stage: ResultStage 2 (saveAsTextFile at package.scala:169) INFO 14-12 10:25:07,130 - Parents of final stage: List() INFO 14-12 10:25:07,131 - Missing parents: List() INFO 14-12 10:25:07,131 - Submitting ResultStage 2 (MapPartitionsRDD[12] at saveAsTextFile at package.scala:169), which has no missing parents INFO 14-12 10:25:07,181 - Block broadcast_2 stored as values in memory (estimated size 109.9 KB, free 140.4 KB) INFO 14-12 10:25:07,184 - Block broadcast_2_piece0 stored as bytes in memory (estimated size 41.6 KB, free 182.0 KB) INFO 14-12 10:25:07,185 - Added broadcast_2_piece0 in memory on 192.168.9.4:49953 (size: 41.6 KB, free: 1823.2 MB) INFO 14-12 10:25:07,185 - Created broadcast 2 from broadcast at DAGScheduler.scala:1008 INFO 14-12 10:25:07,186 - Submitting 3 missing tasks from ResultStage 2 (MapPartitionsRDD[12] at saveAsTextFile at package.scala:169) INFO 14-12 10:25:07,186 - Adding task set 2.0 with 3 tasks INFO 14-12 10:25:07,188 - Starting task 2.0 in stage 2.0 (TID 3, dpnode03, partition 2,NODE_LOCAL, 2012 bytes) INFO 14-12 10:25:07,188 - Starting task 0.0 in stage 2.0 (TID 4, dpnode05, partition 0,NODE_LOCAL, 2012 bytes) INFO 14-12 10:25:07,189 - Starting task 1.0 in stage 2.0 (TID 5, dpnode04, partition 1,NODE_LOCAL, 2012 bytes) INFO 14-12 10:25:07,222 - Added broadcast_2_piece0 in memory on dpnode03:36367 (size: 41.6 KB, free: 3.4 GB) INFO 14-12 10:25:07,310 - Added broadcast_2_piece0 in memory on dpnode04:41109 (size: 41.6 KB, free: 3.4 GB) INFO 14-12 10:25:07,319 - Added broadcast_2_piece0 in memory on dpnode05:35812 (size: 41.6 KB, free: 3.4 GB) INFO 14-12 10:25:08,554 - Finished task 2.0 in stage 2.0 (TID 3) in 1366 ms on dpnode03 (1/3) INFO 14-12 10:25:08,931 - Finished task 0.0 in stage 2.0 (TID 4) in 1742 ms on dpnode05 (2/3) INFO 14-12 10:25:09,210 - Finished task 1.0 in stage 2.0 (TID 5) in 2021 ms on dpnode04 (3/3) INFO 14-12 10:25:09,210 - ResultStage 2 (saveAsTextFile at package.scala:169) finished in 2.023 s INFO 14-12 10:25:09,210 - Removed TaskSet 2.0, whose tasks have all completed, from pool INFO 14-12 10:25:09,211 - Job 2 finished: saveAsTextFile at package.scala:169, took 2.081483 s INFO 14-12 10:25:09,334 - temporary CSV file size: 5.512237548828125E-4 MB INFO 14-12 10:25:09,336 - streaming-job-executor-0 Query [ LOAD DATA INPATH './TEMPCSV' INTO TABLE DEFAULT.SALE OPTIONS ('FILEHEADER' = 'ID,STORE_ID,ORDER_CODE,CHECKOUT_DATE,SALEAMT,INVOICE_PRICE,BILL_SALE_OFF,GIVEAMT,MEMBER_CODE,CASHIER_CODE,CASHIER_NAME,ITEM_CONT,GIVE_TYPE,SALETYPE') ] INFO 14-12 10:25:09,625 - Successfully able to get the table metadata file lock AUDIT 14-12 10:25:09,630 - [dpnode04][hdfs][Thread-188]Dataload failed for default.sale. The input file does not exist: ./tempCSV INFO 14-12 10:25:09,631 - streaming-job-executor-0 Successfully deleted the lock file /data08/hadoop/yarn/local/usercache/hdfs/appcache/application_1481679069818_0015/container_e16_1481679069818_0015_01_000001/tmp/default/sale/meta.lock INFO 14-12 10:25:09,632 - Table MetaData Unlocked Successfully after data load INFO 14-12 10:25:09,666 - Finished job streaming job 1481682300000 ms.0 from job set of time 1481682300000 ms INFO 14-12 10:25:09,668 - Total delay: 9.662 s for time 1481682300000 ms (execution: 9.476 s) ERROR 14-12 10:25:09,672 - Error running job streaming job 1481682300000 ms.0 org.apache.carbondata.processing.etl.DataLoadingException: The input file does not exist: ./tempCSV at org.apache.spark.util.FileUtils$$anonfun$getPaths$1.apply$mcVI$sp(FileUtils.scala:66) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.FileUtils$.getPaths(FileUtils.scala:62) at org.apache.spark.sql.execution.command.LoadTableUsingKettle.run(carbonTableSchema.scala:1096) at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1036) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130) at org.apache.carbondata.spark.rdd.CarbonDataFrameRDD.<init>(CarbonDataFrameRDD.scala:23) at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:137) at org.apache.carbondata.spark.CarbonDataFrameWriter.loadTempCSV(CarbonDataFrameWriter.scala:84) at org.apache.carbondata.spark.CarbonDataFrameWriter.writeToCarbonFile(CarbonDataFrameWriter.scala:49) at org.apache.carbondata.spark.CarbonDataFrameWriter.appendToCarbonFile(CarbonDataFrameWriter.scala:42) at org.apache.spark.sql.CarbonSource.createRelation(CarbonDatasourceRelation.scala:112) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148) at cn.com.jldata.app.SaleStoreApp$$anonfun$ToCarbon$1.apply(SaleStoreApp.scala:38) at cn.com.jldata.app.SaleStoreApp$$anonfun$ToCarbon$1.apply(SaleStoreApp.scala:25) at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49) at scala.util.Try$.apply(Try.scala:161) at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:227) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:227) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:227) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:226) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ERROR 14-12 10:25:09,675 - User class threw exception: org.apache.carbondata.processing.etl.DataLoadingException: The input file does not exist: ./tempCSV org.apache.carbondata.processing.etl.DataLoadingException: The input file does not exist: ./tempCSV at org.apache.spark.util.FileUtils$$anonfun$getPaths$1.apply$mcVI$sp(FileUtils.scala:66) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.FileUtils$.getPaths(FileUtils.scala:62) at org.apache.spark.sql.execution.command.LoadTableUsingKettle.run(carbonTableSchema.scala:1096) at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1036) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130) at org.apache.carbondata.spark.rdd.CarbonDataFrameRDD.<init>(CarbonDataFrameRDD.scala:23) at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:137) at org.apache.carbondata.spark.CarbonDataFrameWriter.loadTempCSV(CarbonDataFrameWriter.scala:84) at org.apache.carbondata.spark.CarbonDataFrameWriter.writeToCarbonFile(CarbonDataFrameWriter.scala:49) at org.apache.carbondata.spark.CarbonDataFrameWriter.appendToCarbonFile(CarbonDataFrameWriter.scala:42) at org.apache.spark.sql.CarbonSource.createRelation(CarbonDatasourceRelation.scala:112) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148) at cn.com.jldata.app.SaleStoreApp$$anonfun$ToCarbon$1.apply(SaleStoreApp.scala:38) at cn.com.jldata.app.SaleStoreApp$$anonfun$ToCarbon$1.apply(SaleStoreApp.scala:25) at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49) at scala.util.Try$.apply(Try.scala:161) at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:227) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:227) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:227) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:226) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) INFO 14-12 10:25:09,677 - Deleting batches ArrayBuffer() INFO 14-12 10:25:09,679 - Final app status: FAILED, exitCode: 15, (reason: User class threw exception: org.apache.carbondata.processing.etl.DataLoadingException: The input file does not exist: ./tempCSV) INFO 14-12 10:25:09,685 - remove old batch metadata: INFO 14-12 10:25:09,687 - Invoking stop(stopGracefully=false) from shutdown hook INFO 14-12 10:25:09,690 - Stopping JobGenerator immediately INFO 14-12 10:25:09,692 - Stopped timer for JobGenerator after time 1481682300000 INFO 14-12 10:25:09,694 - Stopped JobGenerator INFO 14-12 10:25:09,697 - Stopped JobScheduler INFO 14-12 10:25:09,704 - stopped o.s.j.s.ServletContextHandler{/streaming,null} INFO 14-12 10:25:09,706 - stopped o.s.j.s.ServletContextHandler{/streaming/batch,null} INFO 14-12 10:25:09,709 - stopped o.s.j.s.ServletContextHandler{/static/streaming,null} INFO 14-12 10:25:09,711 - StreamingContext stopped successfully |
Administrator
|
Hi
Because currently load data to carbondata only support csv files, so the below code steps be : Dataframe->csv files->load data to Carbon Table. This is why there is "/TempCSV" folder. In next version(1.0.0), will optimize it, directly dataframe -> Carbon table. Regards Liang --------------------------------------------------------------------- I use Spark Streaming consuming Kafka. When I save dataframe to carbon table , why the log showing "LOAD DATA INPATH './TEMPCSV' " ? Following is the code. result.show() result.write .format("carbondata") .option("tableName","sale") .mode(SaveMode.Append) .save() Regards Liang 2016-12-14 11:36 GMT+08:00 Li Peng <[hidden email]>: > Hi, > I use Spark Streaming consuming Kafka. When I save dataframe to carbon > table , why the log showing "LOAD DATA INPATH './TEMPCSV' " ? > > Following is the code. > > result.show() > result.write > .format("carbondata") > .option("tableName","sale") > .mode(SaveMode.Append) > .save() > > > Following is the detail. Thanks. > > INFO 14-12 10:25:00,190 - Starting job streaming job 1481682300000 ms.0 > from job set of time 1481682300000 ms > INFO 14-12 10:25:00,244 - streaming-job-executor-0 Property file path: > /data08/hadoop/yarn/local/usercache/hdfs/appcache/ > application_1481679069818_0015/container_e16_ > 1481679069818_0015_01_000001/../../../conf/carbon.properties > INFO 14-12 10:25:00,245 - streaming-job-executor-0 ------Using > Carbon.properties -------- > INFO 14-12 10:25:00,245 - streaming-job-executor-0 {} > INFO 14-12 10:25:00,475 - streaming-job-executor-0 Table block size not > specified for default_carbontest. Therefore considering the default value > 1024 MB > INFO 14-12 10:25:00,669 - streaming-job-executor-0 Table block size not > specified for default_default_table. Therefore considering the default > value > 1024 MB > INFO 14-12 10:25:00,682 - streaming-job-executor-0 Table block size not > specified for default_sale. Therefore considering the default value 1024 MB > INFO 14-12 10:25:01,010 - streaming-job-executor-0 Query [SELECT * FROM > SALE_TMP] > INFO 14-12 10:25:01,294 - Parsing command: select * from sale_tmp > INFO 14-12 10:25:02,020 - Parse Completed > INFO 14-12 10:25:02,037 - Parsing command: select * from sale_tmp > INFO 14-12 10:25:02,038 - Parse Completed > INFO 14-12 10:25:02,319 - Starting job: show at SaleStoreApp.scala:33 > INFO 14-12 10:25:02,331 - Got job 0 (show at SaleStoreApp.scala:33) with 1 > output partitions > INFO 14-12 10:25:02,332 - Final stage: ResultStage 0 (show at > SaleStoreApp.scala:33) > INFO 14-12 10:25:02,332 - Parents of final stage: List() > INFO 14-12 10:25:02,334 - Missing parents: List() > INFO 14-12 10:25:02,340 - Submitting ResultStage 0 (MapPartitionsRDD[8] at > show at SaleStoreApp.scala:33), which has no missing parents > INFO 14-12 10:25:02,403 - Block broadcast_0 stored as values in memory > (estimated size 10.4 KB, free 10.4 KB) > INFO 14-12 10:25:02,412 - Block broadcast_0_piece0 stored as bytes in > memory (estimated size 4.9 KB, free 15.3 KB) > INFO 14-12 10:25:02,413 - Added broadcast_0_piece0 in memory on > 192.168.9.4:49953 (size: 4.9 KB, free: 1823.2 MB) > INFO 14-12 10:25:02,415 - Created broadcast 0 from broadcast at > DAGScheduler.scala:1008 > INFO 14-12 10:25:02,419 - Submitting 1 missing tasks from ResultStage 0 > (MapPartitionsRDD[8] at show at SaleStoreApp.scala:33) > INFO 14-12 10:25:02,420 - Adding task set 0.0 with 1 tasks > INFO 14-12 10:25:02,444 - Starting task 0.0 in stage 0.0 (TID 0, dpnode05, > partition 0,NODE_LOCAL, 2012 bytes) > INFO 14-12 10:25:02,912 - Added broadcast_0_piece0 in memory on > dpnode05:35812 (size: 4.9 KB, free: 3.4 GB) > INFO 14-12 10:25:04,648 - Finished task 0.0 in stage 0.0 (TID 0) in 2213 > ms > on dpnode05 (1/1) > INFO 14-12 10:25:04,652 - Removed TaskSet 0.0, whose tasks have all > completed, from pool > INFO 14-12 10:25:04,656 - ResultStage 0 (show at SaleStoreApp.scala:33) > finished in 2.224 s > INFO 14-12 10:25:04,664 - Job 0 finished: show at SaleStoreApp.scala:33, > took 2.344778 s > INFO 14-12 10:25:04,678 - Starting job: show at SaleStoreApp.scala:33 > INFO 14-12 10:25:04,681 - Got job 1 (show at SaleStoreApp.scala:33) with 2 > output partitions > INFO 14-12 10:25:04,681 - Final stage: ResultStage 1 (show at > SaleStoreApp.scala:33) > INFO 14-12 10:25:04,681 - Parents of final stage: List() > INFO 14-12 10:25:04,682 - Missing parents: List() > INFO 14-12 10:25:04,683 - Submitting ResultStage 1 (MapPartitionsRDD[8] at > show at SaleStoreApp.scala:33), which has no missing parents > INFO 14-12 10:25:04,695 - Block broadcast_1 stored as values in memory > (estimated size 10.4 KB, free 25.7 KB) > INFO 14-12 10:25:04,698 - Block broadcast_1_piece0 stored as bytes in > memory (estimated size 4.9 KB, free 30.5 KB) > INFO 14-12 10:25:04,700 - Added broadcast_1_piece0 in memory on > 192.168.9.4:49953 (size: 4.9 KB, free: 1823.2 MB) > INFO 14-12 10:25:04,701 - Created broadcast 1 from broadcast at > DAGScheduler.scala:1008 > INFO 14-12 10:25:04,702 - Submitting 2 missing tasks from ResultStage 1 > (MapPartitionsRDD[8] at show at SaleStoreApp.scala:33) > INFO 14-12 10:25:04,702 - Adding task set 1.0 with 2 tasks > INFO 14-12 10:25:04,706 - Starting task 1.0 in stage 1.0 (TID 1, dpnode03, > partition 2,NODE_LOCAL, 2012 bytes) > INFO 14-12 10:25:04,708 - Starting task 0.0 in stage 1.0 (TID 2, dpnode04, > partition 1,NODE_LOCAL, 2012 bytes) > INFO 14-12 10:25:04,949 - Added broadcast_1_piece0 in memory on > dpnode03:36367 (size: 4.9 KB, free: 3.4 GB) > INFO 14-12 10:25:05,159 - Added broadcast_1_piece0 in memory on > dpnode04:41109 (size: 4.9 KB, free: 3.4 GB) > INFO 14-12 10:25:06,587 - Finished task 1.0 in stage 1.0 (TID 1) in 1881 > ms > on dpnode03 (1/2) > INFO 14-12 10:25:06,868 - ResultStage 1 (show at SaleStoreApp.scala:33) > finished in 2.164 s > INFO 14-12 10:25:06,868 - Finished task 0.0 in stage 1.0 (TID 2) in 2160 > ms > on dpnode04 (2/2) > INFO 14-12 10:25:06,868 - Job 1 finished: show at SaleStoreApp.scala:33, > took 2.189452 s > INFO 14-12 10:25:06,868 - Removed TaskSet 1.0, whose tasks have all > completed, from pool > +--------------------+--------------------+------------+---- > ---------+-------+-------------+-------------+-------+------ > -----+------------+------------+---------+---------+--------+ > | id| store_id| > order_code|checkout_date|saleamt|invoice_price|bill_ > sale_off|giveamt|member_code|cashier_code|cashier_name| > item_cont|give_type|saletype| > +--------------------+--------------------+------------+---- > ---------+-------+-------------+-------------+-------+------ > -----+------------+------------+---------+---------+--------+ > |f34d3a92-d3d8-41d...|d166fd97-93dd-414...|XS1608250002|1472116276000| > 4.0| 4.0| 0.0| 0.0| | 00000001| > admin| 1.0| 0.0| 1| > |26399c17-858c-4a8...|d166fd97-93dd-414...|XS1608250003|1472116308000| > 9.0| 9.0| 0.0| 0.0| | 00000001| > admin| 2.0| 0.0| 1| > |1cbe7ac2-d754-464...|d166fd97-93dd-414...|XS1608250001|1472116093000| > 12.5| 12.5| 0.0| 0.0| | 00000001| > admin| 2.0| 0.0| 1| > |1532bde2-ae14-404...|d166fd97-93dd-414...|XS1608250004|1472116363000| > 360.0| 360.0| 0.0| 0.0| | 00000001| > admin| 3.0| 0.0| 1| > +--------------------+--------------------+------------+---- > ---------+-------+-------------+-------------+-------+------ > -----+------------+------------+---------+---------+--------+ > > INFO 14-12 10:25:06,971 - mapred.tip.id is deprecated. Instead, use > mapreduce.task.id > INFO 14-12 10:25:06,971 - mapred.task.id is deprecated. Instead, use > mapreduce.task.attempt.id > INFO 14-12 10:25:06,971 - mapred.task.is.map is deprecated. Instead, use > mapreduce.task.ismap > INFO 14-12 10:25:06,972 - mapred.task.partition is deprecated. Instead, > use > mapreduce.task.partition > INFO 14-12 10:25:06,972 - mapred.job.id is deprecated. Instead, use > mapreduce.job.id > INFO 14-12 10:25:06,975 - File Output Committer Algorithm version is 2 > INFO 14-12 10:25:06,975 - FileOutputCommitter skip cleanup _temporary > folders under output directory:false, ignore cleanup failures: false > INFO 14-12 10:25:07,129 - Starting job: saveAsTextFile at > package.scala:169 > INFO 14-12 10:25:07,130 - Got job 2 (saveAsTextFile at package.scala:169) > with 3 output partitions > INFO 14-12 10:25:07,130 - Final stage: ResultStage 2 (saveAsTextFile at > package.scala:169) > INFO 14-12 10:25:07,130 - Parents of final stage: List() > INFO 14-12 10:25:07,131 - Missing parents: List() > INFO 14-12 10:25:07,131 - Submitting ResultStage 2 (MapPartitionsRDD[12] > at > saveAsTextFile at package.scala:169), which has no missing parents > INFO 14-12 10:25:07,181 - Block broadcast_2 stored as values in memory > (estimated size 109.9 KB, free 140.4 KB) > INFO 14-12 10:25:07,184 - Block broadcast_2_piece0 stored as bytes in > memory (estimated size 41.6 KB, free 182.0 KB) > INFO 14-12 10:25:07,185 - Added broadcast_2_piece0 in memory on > 192.168.9.4:49953 (size: 41.6 KB, free: 1823.2 MB) > INFO 14-12 10:25:07,185 - Created broadcast 2 from broadcast at > DAGScheduler.scala:1008 > INFO 14-12 10:25:07,186 - Submitting 3 missing tasks from ResultStage 2 > (MapPartitionsRDD[12] at saveAsTextFile at package.scala:169) > INFO 14-12 10:25:07,186 - Adding task set 2.0 with 3 tasks > INFO 14-12 10:25:07,188 - Starting task 2.0 in stage 2.0 (TID 3, dpnode03, > partition 2,NODE_LOCAL, 2012 bytes) > INFO 14-12 10:25:07,188 - Starting task 0.0 in stage 2.0 (TID 4, dpnode05, > partition 0,NODE_LOCAL, 2012 bytes) > INFO 14-12 10:25:07,189 - Starting task 1.0 in stage 2.0 (TID 5, dpnode04, > partition 1,NODE_LOCAL, 2012 bytes) > INFO 14-12 10:25:07,222 - Added broadcast_2_piece0 in memory on > dpnode03:36367 (size: 41.6 KB, free: 3.4 GB) > INFO 14-12 10:25:07,310 - Added broadcast_2_piece0 in memory on > dpnode04:41109 (size: 41.6 KB, free: 3.4 GB) > INFO 14-12 10:25:07,319 - Added broadcast_2_piece0 in memory on > dpnode05:35812 (size: 41.6 KB, free: 3.4 GB) > INFO 14-12 10:25:08,554 - Finished task 2.0 in stage 2.0 (TID 3) in 1366 > ms > on dpnode03 (1/3) > INFO 14-12 10:25:08,931 - Finished task 0.0 in stage 2.0 (TID 4) in 1742 > ms > on dpnode05 (2/3) > INFO 14-12 10:25:09,210 - Finished task 1.0 in stage 2.0 (TID 5) in 2021 > ms > on dpnode04 (3/3) > INFO 14-12 10:25:09,210 - ResultStage 2 (saveAsTextFile at > package.scala:169) finished in 2.023 s > INFO 14-12 10:25:09,210 - Removed TaskSet 2.0, whose tasks have all > completed, from pool > INFO 14-12 10:25:09,211 - Job 2 finished: saveAsTextFile at > package.scala:169, took 2.081483 s > INFO 14-12 10:25:09,334 - temporary CSV file size: 5.512237548828125E-4 MB > INFO 14-12 10:25:09,336 - streaming-job-executor-0 Query [ > LOAD DATA INPATH './TEMPCSV' > INTO TABLE DEFAULT.SALE > OPTIONS ('FILEHEADER' = > 'ID,STORE_ID,ORDER_CODE,CHECKOUT_DATE,SALEAMT,INVOICE_ > PRICE,BILL_SALE_OFF,GIVEAMT,MEMBER_CODE,CASHIER_CODE, > CASHIER_NAME,ITEM_CONT,GIVE_TYPE,SALETYPE') > ] > INFO 14-12 10:25:09,625 - Successfully able to get the table metadata file > lock > AUDIT 14-12 10:25:09,630 - [dpnode04][hdfs][Thread-188]Dataload failed for > default.sale. The input file does not exist: ./tempCSV > INFO 14-12 10:25:09,631 - streaming-job-executor-0 Successfully deleted > the > lock file > /data08/hadoop/yarn/local/usercache/hdfs/appcache/ > application_1481679069818_0015/container_e16_1481679069818_0015_01_000001/ > tmp/default/sale/meta.lock > INFO 14-12 10:25:09,632 - Table MetaData Unlocked Successfully after data > load > INFO 14-12 10:25:09,666 - Finished job streaming job 1481682300000 ms.0 > from job set of time 1481682300000 ms > INFO 14-12 10:25:09,668 - Total delay: 9.662 s for time 1481682300000 ms > (execution: 9.476 s) > ERROR 14-12 10:25:09,672 - Error running job streaming job 1481682300000 > ms.0 > org.apache.carbondata.processing.etl.DataLoadingException: The input file > does not exist: ./tempCSV > at > org.apache.spark.util.FileUtils$$anonfun$getPaths$1. > apply$mcVI$sp(FileUtils.scala:66) > at scala.collection.immutable.Range.foreach$mVc$sp(Range. > scala:141) > at org.apache.spark.util.FileUtils$.getPaths(FileUtils.scala:62) > at > org.apache.spark.sql.execution.command.LoadTableUsingKettle.run( > carbonTableSchema.scala:1096) > at > org.apache.spark.sql.execution.command.LoadTable. > run(carbonTableSchema.scala:1036) > at > org.apache.spark.sql.execution.ExecutedCommand. > sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands. > scala:56) > at > org.apache.spark.sql.execution.ExecutedCommand. > doExecute(commands.scala:70) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$ > execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$ > execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute( > SparkPlan.scala:130) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute( > QueryExecution.scala:55) > at > org.apache.spark.sql.execution.QueryExecution. > toRdd(QueryExecution.scala:55) > at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145) > at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130) > at > org.apache.carbondata.spark.rdd.CarbonDataFrameRDD.<init>( > CarbonDataFrameRDD.scala:23) > at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:137) > at > org.apache.carbondata.spark.CarbonDataFrameWriter.loadTempCSV( > CarbonDataFrameWriter.scala:84) > at > org.apache.carbondata.spark.CarbonDataFrameWriter.writeToCarbonFile( > CarbonDataFrameWriter.scala:49) > at > org.apache.carbondata.spark.CarbonDataFrameWriter.appendToCarbonFile( > CarbonDataFrameWriter.scala:42) > at > org.apache.spark.sql.CarbonSource.createRelation(CarbonDatasourceRelation. > scala:112) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply( > ResolvedDataSource.scala:222) > at org.apache.spark.sql.DataFrameWriter.save( > DataFrameWriter.scala:148) > at > cn.com.jldata.app.SaleStoreApp$$anonfun$ToCarbon$1.apply(SaleStoreApp. > scala:38) > at > cn.com.jldata.app.SaleStoreApp$$anonfun$ToCarbon$1.apply(SaleStoreApp. > scala:25) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$ > foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$ > foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) > at > org.apache.spark.streaming.dstream.ForEachDStream$$ > anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50) > at > org.apache.spark.streaming.dstream.ForEachDStream$$ > anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) > at > org.apache.spark.streaming.dstream.ForEachDStream$$ > anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) > at > org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties( > DStream.scala:426) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp( > ForEachDStream.scala:49) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply( > ForEachDStream.scala:49) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply( > ForEachDStream.scala:49) > at scala.util.Try$.apply(Try.scala:161) > at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39) > at > org.apache.spark.streaming.scheduler.JobScheduler$ > JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:227) > at > org.apache.spark.streaming.scheduler.JobScheduler$ > JobHandler$$anonfun$run$1.apply(JobScheduler.scala:227) > at > org.apache.spark.streaming.scheduler.JobScheduler$ > JobHandler$$anonfun$run$1.apply(JobScheduler.scala:227) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > at > org.apache.spark.streaming.scheduler.JobScheduler$ > JobHandler.run(JobScheduler.scala:226) > at > java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > ERROR 14-12 10:25:09,675 - User class threw exception: > org.apache.carbondata.processing.etl.DataLoadingException: The input file > does not exist: ./tempCSV > org.apache.carbondata.processing.etl.DataLoadingException: The input file > does not exist: ./tempCSV > at > org.apache.spark.util.FileUtils$$anonfun$getPaths$1. > apply$mcVI$sp(FileUtils.scala:66) > at scala.collection.immutable.Range.foreach$mVc$sp(Range. > scala:141) > at org.apache.spark.util.FileUtils$.getPaths(FileUtils.scala:62) > at > org.apache.spark.sql.execution.command.LoadTableUsingKettle.run( > carbonTableSchema.scala:1096) > at > org.apache.spark.sql.execution.command.LoadTable. > run(carbonTableSchema.scala:1036) > at > org.apache.spark.sql.execution.ExecutedCommand. > sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands. > scala:56) > at > org.apache.spark.sql.execution.ExecutedCommand. > doExecute(commands.scala:70) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$ > execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$ > execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute( > SparkPlan.scala:130) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute( > QueryExecution.scala:55) > at > org.apache.spark.sql.execution.QueryExecution. > toRdd(QueryExecution.scala:55) > at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145) > at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130) > at > org.apache.carbondata.spark.rdd.CarbonDataFrameRDD.<init>( > CarbonDataFrameRDD.scala:23) > at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:137) > at > org.apache.carbondata.spark.CarbonDataFrameWriter.loadTempCSV( > CarbonDataFrameWriter.scala:84) > at > org.apache.carbondata.spark.CarbonDataFrameWriter.writeToCarbonFile( > CarbonDataFrameWriter.scala:49) > at > org.apache.carbondata.spark.CarbonDataFrameWriter.appendToCarbonFile( > CarbonDataFrameWriter.scala:42) > at > org.apache.spark.sql.CarbonSource.createRelation(CarbonDatasourceRelation. > scala:112) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply( > ResolvedDataSource.scala:222) > at org.apache.spark.sql.DataFrameWriter.save( > DataFrameWriter.scala:148) > at > cn.com.jldata.app.SaleStoreApp$$anonfun$ToCarbon$1.apply(SaleStoreApp. > scala:38) > at > cn.com.jldata.app.SaleStoreApp$$anonfun$ToCarbon$1.apply(SaleStoreApp. > scala:25) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$ > foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$ > foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) > at > org.apache.spark.streaming.dstream.ForEachDStream$$ > anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50) > at > org.apache.spark.streaming.dstream.ForEachDStream$$ > anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) > at > org.apache.spark.streaming.dstream.ForEachDStream$$ > anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) > at > org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties( > DStream.scala:426) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp( > ForEachDStream.scala:49) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply( > ForEachDStream.scala:49) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply( > ForEachDStream.scala:49) > at scala.util.Try$.apply(Try.scala:161) > at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39) > at > org.apache.spark.streaming.scheduler.JobScheduler$ > JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:227) > at > org.apache.spark.streaming.scheduler.JobScheduler$ > JobHandler$$anonfun$run$1.apply(JobScheduler.scala:227) > at > org.apache.spark.streaming.scheduler.JobScheduler$ > JobHandler$$anonfun$run$1.apply(JobScheduler.scala:227) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > at > org.apache.spark.streaming.scheduler.JobScheduler$ > JobHandler.run(JobScheduler.scala:226) > at > java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > INFO 14-12 10:25:09,677 - Deleting batches ArrayBuffer() > INFO 14-12 10:25:09,679 - Final app status: FAILED, exitCode: 15, (reason: > User class threw exception: > org.apache.carbondata.processing.etl.DataLoadingException: The input file > does not exist: ./tempCSV) > INFO 14-12 10:25:09,685 - remove old batch metadata: > INFO 14-12 10:25:09,687 - Invoking stop(stopGracefully=false) from > shutdown > hook > INFO 14-12 10:25:09,690 - Stopping JobGenerator immediately > INFO 14-12 10:25:09,692 - Stopped timer for JobGenerator after time > 1481682300000 > INFO 14-12 10:25:09,694 - Stopped JobGenerator > INFO 14-12 10:25:09,697 - Stopped JobScheduler > INFO 14-12 10:25:09,704 - stopped > o.s.j.s.ServletContextHandler{/streaming,null} > INFO 14-12 10:25:09,706 - stopped > o.s.j.s.ServletContextHandler{/streaming/batch,null} > INFO 14-12 10:25:09,709 - stopped > o.s.j.s.ServletContextHandler{/static/streaming,null} > INFO 14-12 10:25:09,711 - StreamingContext stopped successfully > > > > -- > View this message in context: http://apache-carbondata- > mailing-list-archive.1130556.n5.nabble.com/save-dataframe- > error-why-loading-TEMPCSV-tp4384.html > Sent from the Apache CarbonData Mailing List archive mailing list archive > at Nabble.com. > -- Regards Liang |
Thanks.
I use carbondata 0.2.0 version now. In the step : Dataframe->csv files->load data to Carbon Table. I don't know where the csv files is stored? the log is: LOAD DATA INPATH './TEMPCSV' INTO TABLE DEFAULT.SALE the INPATH is not found. org.apache.carbondata.processing.etl.DataLoadingException: The input file does not exist: ./tempCSV |
Hi
tempCSV just is a temp folder, will be deleted after finishing load data to carbon table. You can set some breakpoints to debug example DataFrameAPIExample.scala , you will find the temp folder. Regards Liang Regards Liang 2016-12-14 13:55 GMT+08:00 Li Peng <[hidden email]>: > Thanks. > I use carbondata 0.2.0 version now. > In the step : Dataframe->csv files->load data to Carbon Table. I don't > know where the csv files is stored? > the log is: > LOAD DATA INPATH './TEMPCSV' > INTO TABLE DEFAULT.SALE > the INPATH is not found. > org.apache.carbondata.processing.etl.DataLoadingException: The input > file > does not exist: ./tempCSV > > > > > > > > -- > View this message in context: http://apache-carbondata- > mailing-list-archive.1130556.n5.nabble.com/save-dataframe- > error-why-loading-TEMPCSV-tp4384p4392.html > Sent from the Apache CarbonData Mailing List archive mailing list archive > at Nabble.com. > |
Free forum by Nabble | Edit this page |