save dataframe error, why loading ./TEMPCSV ?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

save dataframe error, why loading ./TEMPCSV ?

Li Peng
Hi,
   I use Spark Streaming consuming Kafka. When I save dataframe to carbon table , why the log showing "LOAD DATA INPATH './TEMPCSV' " ?

Following is the code.

result.show()
result.write
.format("carbondata")
.option("tableName","sale")
.mode(SaveMode.Append)
.save()


Following is the detail. Thanks.
 
INFO  14-12 10:25:00,190 - Starting job streaming job 1481682300000 ms.0 from job set of time 1481682300000 ms
INFO  14-12 10:25:00,244 - streaming-job-executor-0 Property file path: /data08/hadoop/yarn/local/usercache/hdfs/appcache/application_1481679069818_0015/container_e16_1481679069818_0015_01_000001/../../../conf/carbon.properties
INFO  14-12 10:25:00,245 - streaming-job-executor-0 ------Using Carbon.properties --------
INFO  14-12 10:25:00,245 - streaming-job-executor-0 {}
INFO  14-12 10:25:00,475 - streaming-job-executor-0 Table block size not specified for default_carbontest. Therefore considering the default value 1024 MB
INFO  14-12 10:25:00,669 - streaming-job-executor-0 Table block size not specified for default_default_table. Therefore considering the default value 1024 MB
INFO  14-12 10:25:00,682 - streaming-job-executor-0 Table block size not specified for default_sale. Therefore considering the default value 1024 MB
INFO  14-12 10:25:01,010 - streaming-job-executor-0 Query [SELECT * FROM SALE_TMP]
INFO  14-12 10:25:01,294 - Parsing command: select * from sale_tmp
INFO  14-12 10:25:02,020 - Parse Completed
INFO  14-12 10:25:02,037 - Parsing command: select * from sale_tmp
INFO  14-12 10:25:02,038 - Parse Completed
INFO  14-12 10:25:02,319 - Starting job: show at SaleStoreApp.scala:33
INFO  14-12 10:25:02,331 - Got job 0 (show at SaleStoreApp.scala:33) with 1 output partitions
INFO  14-12 10:25:02,332 - Final stage: ResultStage 0 (show at SaleStoreApp.scala:33)
INFO  14-12 10:25:02,332 - Parents of final stage: List()
INFO  14-12 10:25:02,334 - Missing parents: List()
INFO  14-12 10:25:02,340 - Submitting ResultStage 0 (MapPartitionsRDD[8] at show at SaleStoreApp.scala:33), which has no missing parents
INFO  14-12 10:25:02,403 - Block broadcast_0 stored as values in memory (estimated size 10.4 KB, free 10.4 KB)
INFO  14-12 10:25:02,412 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 4.9 KB, free 15.3 KB)
INFO  14-12 10:25:02,413 - Added broadcast_0_piece0 in memory on 192.168.9.4:49953 (size: 4.9 KB, free: 1823.2 MB)
INFO  14-12 10:25:02,415 - Created broadcast 0 from broadcast at DAGScheduler.scala:1008
INFO  14-12 10:25:02,419 - Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[8] at show at SaleStoreApp.scala:33)
INFO  14-12 10:25:02,420 - Adding task set 0.0 with 1 tasks
INFO  14-12 10:25:02,444 - Starting task 0.0 in stage 0.0 (TID 0, dpnode05, partition 0,NODE_LOCAL, 2012 bytes)
INFO  14-12 10:25:02,912 - Added broadcast_0_piece0 in memory on dpnode05:35812 (size: 4.9 KB, free: 3.4 GB)
INFO  14-12 10:25:04,648 - Finished task 0.0 in stage 0.0 (TID 0) in 2213 ms on dpnode05 (1/1)
INFO  14-12 10:25:04,652 - Removed TaskSet 0.0, whose tasks have all completed, from pool
INFO  14-12 10:25:04,656 - ResultStage 0 (show at SaleStoreApp.scala:33) finished in 2.224 s
INFO  14-12 10:25:04,664 - Job 0 finished: show at SaleStoreApp.scala:33, took 2.344778 s
INFO  14-12 10:25:04,678 - Starting job: show at SaleStoreApp.scala:33
INFO  14-12 10:25:04,681 - Got job 1 (show at SaleStoreApp.scala:33) with 2 output partitions
INFO  14-12 10:25:04,681 - Final stage: ResultStage 1 (show at SaleStoreApp.scala:33)
INFO  14-12 10:25:04,681 - Parents of final stage: List()
INFO  14-12 10:25:04,682 - Missing parents: List()
INFO  14-12 10:25:04,683 - Submitting ResultStage 1 (MapPartitionsRDD[8] at show at SaleStoreApp.scala:33), which has no missing parents
INFO  14-12 10:25:04,695 - Block broadcast_1 stored as values in memory (estimated size 10.4 KB, free 25.7 KB)
INFO  14-12 10:25:04,698 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 4.9 KB, free 30.5 KB)
INFO  14-12 10:25:04,700 - Added broadcast_1_piece0 in memory on 192.168.9.4:49953 (size: 4.9 KB, free: 1823.2 MB)
INFO  14-12 10:25:04,701 - Created broadcast 1 from broadcast at DAGScheduler.scala:1008
INFO  14-12 10:25:04,702 - Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[8] at show at SaleStoreApp.scala:33)
INFO  14-12 10:25:04,702 - Adding task set 1.0 with 2 tasks
INFO  14-12 10:25:04,706 - Starting task 1.0 in stage 1.0 (TID 1, dpnode03, partition 2,NODE_LOCAL, 2012 bytes)
INFO  14-12 10:25:04,708 - Starting task 0.0 in stage 1.0 (TID 2, dpnode04, partition 1,NODE_LOCAL, 2012 bytes)
INFO  14-12 10:25:04,949 - Added broadcast_1_piece0 in memory on dpnode03:36367 (size: 4.9 KB, free: 3.4 GB)
INFO  14-12 10:25:05,159 - Added broadcast_1_piece0 in memory on dpnode04:41109 (size: 4.9 KB, free: 3.4 GB)
INFO  14-12 10:25:06,587 - Finished task 1.0 in stage 1.0 (TID 1) in 1881 ms on dpnode03 (1/2)
INFO  14-12 10:25:06,868 - ResultStage 1 (show at SaleStoreApp.scala:33) finished in 2.164 s
INFO  14-12 10:25:06,868 - Finished task 0.0 in stage 1.0 (TID 2) in 2160 ms on dpnode04 (2/2)
INFO  14-12 10:25:06,868 - Job 1 finished: show at SaleStoreApp.scala:33, took 2.189452 s
INFO  14-12 10:25:06,868 - Removed TaskSet 1.0, whose tasks have all completed, from pool
+--------------------+--------------------+------------+-------------+-------+-------------+-------------+-------+-----------+------------+------------+---------+---------+--------+
|                  id|            store_id|  order_code|checkout_date|saleamt|invoice_price|bill_sale_off|giveamt|member_code|cashier_code|cashier_name|item_cont|give_type|saletype|
+--------------------+--------------------+------------+-------------+-------+-------------+-------------+-------+-----------+------------+------------+---------+---------+--------+
|f34d3a92-d3d8-41d...|d166fd97-93dd-414...|XS1608250002|1472116276000|    4.0|          4.0|          0.0|    0.0|           |    00000001|       admin|      1.0|      0.0|       1|
|26399c17-858c-4a8...|d166fd97-93dd-414...|XS1608250003|1472116308000|    9.0|          9.0|          0.0|    0.0|           |    00000001|       admin|      2.0|      0.0|       1|
|1cbe7ac2-d754-464...|d166fd97-93dd-414...|XS1608250001|1472116093000|   12.5|         12.5|          0.0|    0.0|           |    00000001|       admin|      2.0|      0.0|       1|
|1532bde2-ae14-404...|d166fd97-93dd-414...|XS1608250004|1472116363000|  360.0|        360.0|          0.0|    0.0|           |    00000001|       admin|      3.0|      0.0|       1|
+--------------------+--------------------+------------+-------------+-------+-------------+-------------+-------+-----------+------------+------------+---------+---------+--------+

INFO  14-12 10:25:06,971 - mapred.tip.id is deprecated. Instead, use mapreduce.task.id
INFO  14-12 10:25:06,971 - mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
INFO  14-12 10:25:06,971 - mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
INFO  14-12 10:25:06,972 - mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
INFO  14-12 10:25:06,972 - mapred.job.id is deprecated. Instead, use mapreduce.job.id
INFO  14-12 10:25:06,975 - File Output Committer Algorithm version is 2
INFO  14-12 10:25:06,975 - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
INFO  14-12 10:25:07,129 - Starting job: saveAsTextFile at package.scala:169
INFO  14-12 10:25:07,130 - Got job 2 (saveAsTextFile at package.scala:169) with 3 output partitions
INFO  14-12 10:25:07,130 - Final stage: ResultStage 2 (saveAsTextFile at package.scala:169)
INFO  14-12 10:25:07,130 - Parents of final stage: List()
INFO  14-12 10:25:07,131 - Missing parents: List()
INFO  14-12 10:25:07,131 - Submitting ResultStage 2 (MapPartitionsRDD[12] at saveAsTextFile at package.scala:169), which has no missing parents
INFO  14-12 10:25:07,181 - Block broadcast_2 stored as values in memory (estimated size 109.9 KB, free 140.4 KB)
INFO  14-12 10:25:07,184 - Block broadcast_2_piece0 stored as bytes in memory (estimated size 41.6 KB, free 182.0 KB)
INFO  14-12 10:25:07,185 - Added broadcast_2_piece0 in memory on 192.168.9.4:49953 (size: 41.6 KB, free: 1823.2 MB)
INFO  14-12 10:25:07,185 - Created broadcast 2 from broadcast at DAGScheduler.scala:1008
INFO  14-12 10:25:07,186 - Submitting 3 missing tasks from ResultStage 2 (MapPartitionsRDD[12] at saveAsTextFile at package.scala:169)
INFO  14-12 10:25:07,186 - Adding task set 2.0 with 3 tasks
INFO  14-12 10:25:07,188 - Starting task 2.0 in stage 2.0 (TID 3, dpnode03, partition 2,NODE_LOCAL, 2012 bytes)
INFO  14-12 10:25:07,188 - Starting task 0.0 in stage 2.0 (TID 4, dpnode05, partition 0,NODE_LOCAL, 2012 bytes)
INFO  14-12 10:25:07,189 - Starting task 1.0 in stage 2.0 (TID 5, dpnode04, partition 1,NODE_LOCAL, 2012 bytes)
INFO  14-12 10:25:07,222 - Added broadcast_2_piece0 in memory on dpnode03:36367 (size: 41.6 KB, free: 3.4 GB)
INFO  14-12 10:25:07,310 - Added broadcast_2_piece0 in memory on dpnode04:41109 (size: 41.6 KB, free: 3.4 GB)
INFO  14-12 10:25:07,319 - Added broadcast_2_piece0 in memory on dpnode05:35812 (size: 41.6 KB, free: 3.4 GB)
INFO  14-12 10:25:08,554 - Finished task 2.0 in stage 2.0 (TID 3) in 1366 ms on dpnode03 (1/3)
INFO  14-12 10:25:08,931 - Finished task 0.0 in stage 2.0 (TID 4) in 1742 ms on dpnode05 (2/3)
INFO  14-12 10:25:09,210 - Finished task 1.0 in stage 2.0 (TID 5) in 2021 ms on dpnode04 (3/3)
INFO  14-12 10:25:09,210 - ResultStage 2 (saveAsTextFile at package.scala:169) finished in 2.023 s
INFO  14-12 10:25:09,210 - Removed TaskSet 2.0, whose tasks have all completed, from pool
INFO  14-12 10:25:09,211 - Job 2 finished: saveAsTextFile at package.scala:169, took 2.081483 s
INFO  14-12 10:25:09,334 - temporary CSV file size: 5.512237548828125E-4 MB
INFO  14-12 10:25:09,336 - streaming-job-executor-0 Query [
          LOAD DATA INPATH './TEMPCSV'
          INTO TABLE DEFAULT.SALE
          OPTIONS ('FILEHEADER' = 'ID,STORE_ID,ORDER_CODE,CHECKOUT_DATE,SALEAMT,INVOICE_PRICE,BILL_SALE_OFF,GIVEAMT,MEMBER_CODE,CASHIER_CODE,CASHIER_NAME,ITEM_CONT,GIVE_TYPE,SALETYPE')
      ]
INFO  14-12 10:25:09,625 - Successfully able to get the table metadata file lock
AUDIT 14-12 10:25:09,630 - [dpnode04][hdfs][Thread-188]Dataload failed for default.sale. The input file does not exist: ./tempCSV
INFO  14-12 10:25:09,631 - streaming-job-executor-0 Successfully deleted the lock file /data08/hadoop/yarn/local/usercache/hdfs/appcache/application_1481679069818_0015/container_e16_1481679069818_0015_01_000001/tmp/default/sale/meta.lock
INFO  14-12 10:25:09,632 - Table MetaData Unlocked Successfully after data load
INFO  14-12 10:25:09,666 - Finished job streaming job 1481682300000 ms.0 from job set of time 1481682300000 ms
INFO  14-12 10:25:09,668 - Total delay: 9.662 s for time 1481682300000 ms (execution: 9.476 s)
ERROR 14-12 10:25:09,672 - Error running job streaming job 1481682300000 ms.0
org.apache.carbondata.processing.etl.DataLoadingException: The input file does not exist: ./tempCSV
        at org.apache.spark.util.FileUtils$$anonfun$getPaths$1.apply$mcVI$sp(FileUtils.scala:66)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
        at org.apache.spark.util.FileUtils$.getPaths(FileUtils.scala:62)
        at org.apache.spark.sql.execution.command.LoadTableUsingKettle.run(carbonTableSchema.scala:1096)
        at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1036)
        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
        at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
        at org.apache.carbondata.spark.rdd.CarbonDataFrameRDD.<init>(CarbonDataFrameRDD.scala:23)
        at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:137)
        at org.apache.carbondata.spark.CarbonDataFrameWriter.loadTempCSV(CarbonDataFrameWriter.scala:84)
        at org.apache.carbondata.spark.CarbonDataFrameWriter.writeToCarbonFile(CarbonDataFrameWriter.scala:49)
        at org.apache.carbondata.spark.CarbonDataFrameWriter.appendToCarbonFile(CarbonDataFrameWriter.scala:42)
        at org.apache.spark.sql.CarbonSource.createRelation(CarbonDatasourceRelation.scala:112)
        at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
        at cn.com.jldata.app.SaleStoreApp$$anonfun$ToCarbon$1.apply(SaleStoreApp.scala:38)
        at cn.com.jldata.app.SaleStoreApp$$anonfun$ToCarbon$1.apply(SaleStoreApp.scala:25)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
        at scala.util.Try$.apply(Try.scala:161)
        at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:227)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:227)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:227)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:226)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
ERROR 14-12 10:25:09,675 - User class threw exception: org.apache.carbondata.processing.etl.DataLoadingException: The input file does not exist: ./tempCSV
org.apache.carbondata.processing.etl.DataLoadingException: The input file does not exist: ./tempCSV
        at org.apache.spark.util.FileUtils$$anonfun$getPaths$1.apply$mcVI$sp(FileUtils.scala:66)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
        at org.apache.spark.util.FileUtils$.getPaths(FileUtils.scala:62)
        at org.apache.spark.sql.execution.command.LoadTableUsingKettle.run(carbonTableSchema.scala:1096)
        at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1036)
        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
        at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
        at org.apache.carbondata.spark.rdd.CarbonDataFrameRDD.<init>(CarbonDataFrameRDD.scala:23)
        at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:137)
        at org.apache.carbondata.spark.CarbonDataFrameWriter.loadTempCSV(CarbonDataFrameWriter.scala:84)
        at org.apache.carbondata.spark.CarbonDataFrameWriter.writeToCarbonFile(CarbonDataFrameWriter.scala:49)
        at org.apache.carbondata.spark.CarbonDataFrameWriter.appendToCarbonFile(CarbonDataFrameWriter.scala:42)
        at org.apache.spark.sql.CarbonSource.createRelation(CarbonDatasourceRelation.scala:112)
        at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
        at cn.com.jldata.app.SaleStoreApp$$anonfun$ToCarbon$1.apply(SaleStoreApp.scala:38)
        at cn.com.jldata.app.SaleStoreApp$$anonfun$ToCarbon$1.apply(SaleStoreApp.scala:25)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
        at scala.util.Try$.apply(Try.scala:161)
        at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:227)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:227)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:227)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:226)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
INFO  14-12 10:25:09,677 - Deleting batches ArrayBuffer()
INFO  14-12 10:25:09,679 - Final app status: FAILED, exitCode: 15, (reason: User class threw exception: org.apache.carbondata.processing.etl.DataLoadingException: The input file does not exist: ./tempCSV)
INFO  14-12 10:25:09,685 - remove old batch metadata:
INFO  14-12 10:25:09,687 - Invoking stop(stopGracefully=false) from shutdown hook
INFO  14-12 10:25:09,690 - Stopping JobGenerator immediately
INFO  14-12 10:25:09,692 - Stopped timer for JobGenerator after time 1481682300000
INFO  14-12 10:25:09,694 - Stopped JobGenerator
INFO  14-12 10:25:09,697 - Stopped JobScheduler
INFO  14-12 10:25:09,704 - stopped o.s.j.s.ServletContextHandler{/streaming,null}
INFO  14-12 10:25:09,706 - stopped o.s.j.s.ServletContextHandler{/streaming/batch,null}
INFO  14-12 10:25:09,709 - stopped o.s.j.s.ServletContextHandler{/static/streaming,null}
INFO  14-12 10:25:09,711 - StreamingContext stopped successfully
Reply | Threaded
Open this post in threaded view
|

Re: save dataframe error, why loading ./TEMPCSV ?

Liang Chen
Administrator
Hi

Because currently load data to carbondata only support csv files, so the
below code steps be : Dataframe->csv files->load data to Carbon Table.

This is why there is "/TempCSV" folder.

In next version(1.0.0), will optimize it, directly dataframe -> Carbon
table.

Regards
Liang
---------------------------------------------------------------------
I use Spark Streaming consuming Kafka. When I save dataframe to carbon
table , why the log showing "LOAD DATA INPATH './TEMPCSV' " ?

Following is the code.
result.show()
result.write
.format("carbondata")
.option("tableName","sale")
.mode(SaveMode.Append)
.save()

Regards
Liang

2016-12-14 11:36 GMT+08:00 Li Peng <[hidden email]>:

> Hi,
>    I use Spark Streaming consuming Kafka. When I save dataframe to carbon
> table , why the log showing "LOAD DATA INPATH './TEMPCSV' " ?
>
> Following is the code.
>
> result.show()
> result.write
> .format("carbondata")
> .option("tableName","sale")
> .mode(SaveMode.Append)
> .save()
>
>
> Following is the detail. Thanks.
>
> INFO  14-12 10:25:00,190 - Starting job streaming job 1481682300000 ms.0
> from job set of time 1481682300000 ms
> INFO  14-12 10:25:00,244 - streaming-job-executor-0 Property file path:
> /data08/hadoop/yarn/local/usercache/hdfs/appcache/
> application_1481679069818_0015/container_e16_
> 1481679069818_0015_01_000001/../../../conf/carbon.properties
> INFO  14-12 10:25:00,245 - streaming-job-executor-0 ------Using
> Carbon.properties --------
> INFO  14-12 10:25:00,245 - streaming-job-executor-0 {}
> INFO  14-12 10:25:00,475 - streaming-job-executor-0 Table block size not
> specified for default_carbontest. Therefore considering the default value
> 1024 MB
> INFO  14-12 10:25:00,669 - streaming-job-executor-0 Table block size not
> specified for default_default_table. Therefore considering the default
> value
> 1024 MB
> INFO  14-12 10:25:00,682 - streaming-job-executor-0 Table block size not
> specified for default_sale. Therefore considering the default value 1024 MB
> INFO  14-12 10:25:01,010 - streaming-job-executor-0 Query [SELECT * FROM
> SALE_TMP]
> INFO  14-12 10:25:01,294 - Parsing command: select * from sale_tmp
> INFO  14-12 10:25:02,020 - Parse Completed
> INFO  14-12 10:25:02,037 - Parsing command: select * from sale_tmp
> INFO  14-12 10:25:02,038 - Parse Completed
> INFO  14-12 10:25:02,319 - Starting job: show at SaleStoreApp.scala:33
> INFO  14-12 10:25:02,331 - Got job 0 (show at SaleStoreApp.scala:33) with 1
> output partitions
> INFO  14-12 10:25:02,332 - Final stage: ResultStage 0 (show at
> SaleStoreApp.scala:33)
> INFO  14-12 10:25:02,332 - Parents of final stage: List()
> INFO  14-12 10:25:02,334 - Missing parents: List()
> INFO  14-12 10:25:02,340 - Submitting ResultStage 0 (MapPartitionsRDD[8] at
> show at SaleStoreApp.scala:33), which has no missing parents
> INFO  14-12 10:25:02,403 - Block broadcast_0 stored as values in memory
> (estimated size 10.4 KB, free 10.4 KB)
> INFO  14-12 10:25:02,412 - Block broadcast_0_piece0 stored as bytes in
> memory (estimated size 4.9 KB, free 15.3 KB)
> INFO  14-12 10:25:02,413 - Added broadcast_0_piece0 in memory on
> 192.168.9.4:49953 (size: 4.9 KB, free: 1823.2 MB)
> INFO  14-12 10:25:02,415 - Created broadcast 0 from broadcast at
> DAGScheduler.scala:1008
> INFO  14-12 10:25:02,419 - Submitting 1 missing tasks from ResultStage 0
> (MapPartitionsRDD[8] at show at SaleStoreApp.scala:33)
> INFO  14-12 10:25:02,420 - Adding task set 0.0 with 1 tasks
> INFO  14-12 10:25:02,444 - Starting task 0.0 in stage 0.0 (TID 0, dpnode05,
> partition 0,NODE_LOCAL, 2012 bytes)
> INFO  14-12 10:25:02,912 - Added broadcast_0_piece0 in memory on
> dpnode05:35812 (size: 4.9 KB, free: 3.4 GB)
> INFO  14-12 10:25:04,648 - Finished task 0.0 in stage 0.0 (TID 0) in 2213
> ms
> on dpnode05 (1/1)
> INFO  14-12 10:25:04,652 - Removed TaskSet 0.0, whose tasks have all
> completed, from pool
> INFO  14-12 10:25:04,656 - ResultStage 0 (show at SaleStoreApp.scala:33)
> finished in 2.224 s
> INFO  14-12 10:25:04,664 - Job 0 finished: show at SaleStoreApp.scala:33,
> took 2.344778 s
> INFO  14-12 10:25:04,678 - Starting job: show at SaleStoreApp.scala:33
> INFO  14-12 10:25:04,681 - Got job 1 (show at SaleStoreApp.scala:33) with 2
> output partitions
> INFO  14-12 10:25:04,681 - Final stage: ResultStage 1 (show at
> SaleStoreApp.scala:33)
> INFO  14-12 10:25:04,681 - Parents of final stage: List()
> INFO  14-12 10:25:04,682 - Missing parents: List()
> INFO  14-12 10:25:04,683 - Submitting ResultStage 1 (MapPartitionsRDD[8] at
> show at SaleStoreApp.scala:33), which has no missing parents
> INFO  14-12 10:25:04,695 - Block broadcast_1 stored as values in memory
> (estimated size 10.4 KB, free 25.7 KB)
> INFO  14-12 10:25:04,698 - Block broadcast_1_piece0 stored as bytes in
> memory (estimated size 4.9 KB, free 30.5 KB)
> INFO  14-12 10:25:04,700 - Added broadcast_1_piece0 in memory on
> 192.168.9.4:49953 (size: 4.9 KB, free: 1823.2 MB)
> INFO  14-12 10:25:04,701 - Created broadcast 1 from broadcast at
> DAGScheduler.scala:1008
> INFO  14-12 10:25:04,702 - Submitting 2 missing tasks from ResultStage 1
> (MapPartitionsRDD[8] at show at SaleStoreApp.scala:33)
> INFO  14-12 10:25:04,702 - Adding task set 1.0 with 2 tasks
> INFO  14-12 10:25:04,706 - Starting task 1.0 in stage 1.0 (TID 1, dpnode03,
> partition 2,NODE_LOCAL, 2012 bytes)
> INFO  14-12 10:25:04,708 - Starting task 0.0 in stage 1.0 (TID 2, dpnode04,
> partition 1,NODE_LOCAL, 2012 bytes)
> INFO  14-12 10:25:04,949 - Added broadcast_1_piece0 in memory on
> dpnode03:36367 (size: 4.9 KB, free: 3.4 GB)
> INFO  14-12 10:25:05,159 - Added broadcast_1_piece0 in memory on
> dpnode04:41109 (size: 4.9 KB, free: 3.4 GB)
> INFO  14-12 10:25:06,587 - Finished task 1.0 in stage 1.0 (TID 1) in 1881
> ms
> on dpnode03 (1/2)
> INFO  14-12 10:25:06,868 - ResultStage 1 (show at SaleStoreApp.scala:33)
> finished in 2.164 s
> INFO  14-12 10:25:06,868 - Finished task 0.0 in stage 1.0 (TID 2) in 2160
> ms
> on dpnode04 (2/2)
> INFO  14-12 10:25:06,868 - Job 1 finished: show at SaleStoreApp.scala:33,
> took 2.189452 s
> INFO  14-12 10:25:06,868 - Removed TaskSet 1.0, whose tasks have all
> completed, from pool
> +--------------------+--------------------+------------+----
> ---------+-------+-------------+-------------+-------+------
> -----+------------+------------+---------+---------+--------+
> |                  id|            store_id|
> order_code|checkout_date|saleamt|invoice_price|bill_
> sale_off|giveamt|member_code|cashier_code|cashier_name|
> item_cont|give_type|saletype|
> +--------------------+--------------------+------------+----
> ---------+-------+-------------+-------------+-------+------
> -----+------------+------------+---------+---------+--------+
> |f34d3a92-d3d8-41d...|d166fd97-93dd-414...|XS1608250002|1472116276000|
> 4.0|          4.0|          0.0|    0.0|           |    00000001|
> admin|      1.0|      0.0|       1|
> |26399c17-858c-4a8...|d166fd97-93dd-414...|XS1608250003|1472116308000|
> 9.0|          9.0|          0.0|    0.0|           |    00000001|
> admin|      2.0|      0.0|       1|
> |1cbe7ac2-d754-464...|d166fd97-93dd-414...|XS1608250001|1472116093000|
> 12.5|         12.5|          0.0|    0.0|           |    00000001|
> admin|      2.0|      0.0|       1|
> |1532bde2-ae14-404...|d166fd97-93dd-414...|XS1608250004|1472116363000|
> 360.0|        360.0|          0.0|    0.0|           |    00000001|
> admin|      3.0|      0.0|       1|
> +--------------------+--------------------+------------+----
> ---------+-------+-------------+-------------+-------+------
> -----+------------+------------+---------+---------+--------+
>
> INFO  14-12 10:25:06,971 - mapred.tip.id is deprecated. Instead, use
> mapreduce.task.id
> INFO  14-12 10:25:06,971 - mapred.task.id is deprecated. Instead, use
> mapreduce.task.attempt.id
> INFO  14-12 10:25:06,971 - mapred.task.is.map is deprecated. Instead, use
> mapreduce.task.ismap
> INFO  14-12 10:25:06,972 - mapred.task.partition is deprecated. Instead,
> use
> mapreduce.task.partition
> INFO  14-12 10:25:06,972 - mapred.job.id is deprecated. Instead, use
> mapreduce.job.id
> INFO  14-12 10:25:06,975 - File Output Committer Algorithm version is 2
> INFO  14-12 10:25:06,975 - FileOutputCommitter skip cleanup _temporary
> folders under output directory:false, ignore cleanup failures: false
> INFO  14-12 10:25:07,129 - Starting job: saveAsTextFile at
> package.scala:169
> INFO  14-12 10:25:07,130 - Got job 2 (saveAsTextFile at package.scala:169)
> with 3 output partitions
> INFO  14-12 10:25:07,130 - Final stage: ResultStage 2 (saveAsTextFile at
> package.scala:169)
> INFO  14-12 10:25:07,130 - Parents of final stage: List()
> INFO  14-12 10:25:07,131 - Missing parents: List()
> INFO  14-12 10:25:07,131 - Submitting ResultStage 2 (MapPartitionsRDD[12]
> at
> saveAsTextFile at package.scala:169), which has no missing parents
> INFO  14-12 10:25:07,181 - Block broadcast_2 stored as values in memory
> (estimated size 109.9 KB, free 140.4 KB)
> INFO  14-12 10:25:07,184 - Block broadcast_2_piece0 stored as bytes in
> memory (estimated size 41.6 KB, free 182.0 KB)
> INFO  14-12 10:25:07,185 - Added broadcast_2_piece0 in memory on
> 192.168.9.4:49953 (size: 41.6 KB, free: 1823.2 MB)
> INFO  14-12 10:25:07,185 - Created broadcast 2 from broadcast at
> DAGScheduler.scala:1008
> INFO  14-12 10:25:07,186 - Submitting 3 missing tasks from ResultStage 2
> (MapPartitionsRDD[12] at saveAsTextFile at package.scala:169)
> INFO  14-12 10:25:07,186 - Adding task set 2.0 with 3 tasks
> INFO  14-12 10:25:07,188 - Starting task 2.0 in stage 2.0 (TID 3, dpnode03,
> partition 2,NODE_LOCAL, 2012 bytes)
> INFO  14-12 10:25:07,188 - Starting task 0.0 in stage 2.0 (TID 4, dpnode05,
> partition 0,NODE_LOCAL, 2012 bytes)
> INFO  14-12 10:25:07,189 - Starting task 1.0 in stage 2.0 (TID 5, dpnode04,
> partition 1,NODE_LOCAL, 2012 bytes)
> INFO  14-12 10:25:07,222 - Added broadcast_2_piece0 in memory on
> dpnode03:36367 (size: 41.6 KB, free: 3.4 GB)
> INFO  14-12 10:25:07,310 - Added broadcast_2_piece0 in memory on
> dpnode04:41109 (size: 41.6 KB, free: 3.4 GB)
> INFO  14-12 10:25:07,319 - Added broadcast_2_piece0 in memory on
> dpnode05:35812 (size: 41.6 KB, free: 3.4 GB)
> INFO  14-12 10:25:08,554 - Finished task 2.0 in stage 2.0 (TID 3) in 1366
> ms
> on dpnode03 (1/3)
> INFO  14-12 10:25:08,931 - Finished task 0.0 in stage 2.0 (TID 4) in 1742
> ms
> on dpnode05 (2/3)
> INFO  14-12 10:25:09,210 - Finished task 1.0 in stage 2.0 (TID 5) in 2021
> ms
> on dpnode04 (3/3)
> INFO  14-12 10:25:09,210 - ResultStage 2 (saveAsTextFile at
> package.scala:169) finished in 2.023 s
> INFO  14-12 10:25:09,210 - Removed TaskSet 2.0, whose tasks have all
> completed, from pool
> INFO  14-12 10:25:09,211 - Job 2 finished: saveAsTextFile at
> package.scala:169, took 2.081483 s
> INFO  14-12 10:25:09,334 - temporary CSV file size: 5.512237548828125E-4 MB
> INFO  14-12 10:25:09,336 - streaming-job-executor-0 Query [
>           LOAD DATA INPATH './TEMPCSV'
>           INTO TABLE DEFAULT.SALE
>           OPTIONS ('FILEHEADER' =
> 'ID,STORE_ID,ORDER_CODE,CHECKOUT_DATE,SALEAMT,INVOICE_
> PRICE,BILL_SALE_OFF,GIVEAMT,MEMBER_CODE,CASHIER_CODE,
> CASHIER_NAME,ITEM_CONT,GIVE_TYPE,SALETYPE')
>       ]
> INFO  14-12 10:25:09,625 - Successfully able to get the table metadata file
> lock
> AUDIT 14-12 10:25:09,630 - [dpnode04][hdfs][Thread-188]Dataload failed for
> default.sale. The input file does not exist: ./tempCSV
> INFO  14-12 10:25:09,631 - streaming-job-executor-0 Successfully deleted
> the
> lock file
> /data08/hadoop/yarn/local/usercache/hdfs/appcache/
> application_1481679069818_0015/container_e16_1481679069818_0015_01_000001/
> tmp/default/sale/meta.lock
> INFO  14-12 10:25:09,632 - Table MetaData Unlocked Successfully after data
> load
> INFO  14-12 10:25:09,666 - Finished job streaming job 1481682300000 ms.0
> from job set of time 1481682300000 ms
> INFO  14-12 10:25:09,668 - Total delay: 9.662 s for time 1481682300000 ms
> (execution: 9.476 s)
> ERROR 14-12 10:25:09,672 - Error running job streaming job 1481682300000
> ms.0
> org.apache.carbondata.processing.etl.DataLoadingException: The input file
> does not exist: ./tempCSV
>         at
> org.apache.spark.util.FileUtils$$anonfun$getPaths$1.
> apply$mcVI$sp(FileUtils.scala:66)
>         at scala.collection.immutable.Range.foreach$mVc$sp(Range.
> scala:141)
>         at org.apache.spark.util.FileUtils$.getPaths(FileUtils.scala:62)
>         at
> org.apache.spark.sql.execution.command.LoadTableUsingKettle.run(
> carbonTableSchema.scala:1096)
>         at
> org.apache.spark.sql.execution.command.LoadTable.
> run(carbonTableSchema.scala:1036)
>         at
> org.apache.spark.sql.execution.ExecutedCommand.
> sideEffectResult$lzycompute(commands.scala:58)
>         at
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.
> scala:56)
>         at
> org.apache.spark.sql.execution.ExecutedCommand.
> doExecute(commands.scala:70)
>         at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$
> execute$5.apply(SparkPlan.scala:132)
>         at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$
> execute$5.apply(SparkPlan.scala:130)
>         at
> org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:150)
>         at org.apache.spark.sql.execution.SparkPlan.execute(
> SparkPlan.scala:130)
>         at
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(
> QueryExecution.scala:55)
>         at
> org.apache.spark.sql.execution.QueryExecution.
> toRdd(QueryExecution.scala:55)
>         at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
>         at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
>         at
> org.apache.carbondata.spark.rdd.CarbonDataFrameRDD.<init>(
> CarbonDataFrameRDD.scala:23)
>         at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:137)
>         at
> org.apache.carbondata.spark.CarbonDataFrameWriter.loadTempCSV(
> CarbonDataFrameWriter.scala:84)
>         at
> org.apache.carbondata.spark.CarbonDataFrameWriter.writeToCarbonFile(
> CarbonDataFrameWriter.scala:49)
>         at
> org.apache.carbondata.spark.CarbonDataFrameWriter.appendToCarbonFile(
> CarbonDataFrameWriter.scala:42)
>         at
> org.apache.spark.sql.CarbonSource.createRelation(CarbonDatasourceRelation.
> scala:112)
>         at
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(
> ResolvedDataSource.scala:222)
>         at org.apache.spark.sql.DataFrameWriter.save(
> DataFrameWriter.scala:148)
>         at
> cn.com.jldata.app.SaleStoreApp$$anonfun$ToCarbon$1.apply(SaleStoreApp.
> scala:38)
>         at
> cn.com.jldata.app.SaleStoreApp$$anonfun$ToCarbon$1.apply(SaleStoreApp.
> scala:25)
>         at
> org.apache.spark.streaming.dstream.DStream$$anonfun$
> foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
>         at
> org.apache.spark.streaming.dstream.DStream$$anonfun$
> foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$
> anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50)
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$
> anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$
> anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
>         at
> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(
> DStream.scala:426)
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(
> ForEachDStream.scala:49)
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(
> ForEachDStream.scala:49)
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(
> ForEachDStream.scala:49)
>         at scala.util.Try$.apply(Try.scala:161)
>         at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
>         at
> org.apache.spark.streaming.scheduler.JobScheduler$
> JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:227)
>         at
> org.apache.spark.streaming.scheduler.JobScheduler$
> JobHandler$$anonfun$run$1.apply(JobScheduler.scala:227)
>         at
> org.apache.spark.streaming.scheduler.JobScheduler$
> JobHandler$$anonfun$run$1.apply(JobScheduler.scala:227)
>         at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>         at
> org.apache.spark.streaming.scheduler.JobScheduler$
> JobHandler.run(JobScheduler.scala:226)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> ERROR 14-12 10:25:09,675 - User class threw exception:
> org.apache.carbondata.processing.etl.DataLoadingException: The input file
> does not exist: ./tempCSV
> org.apache.carbondata.processing.etl.DataLoadingException: The input file
> does not exist: ./tempCSV
>         at
> org.apache.spark.util.FileUtils$$anonfun$getPaths$1.
> apply$mcVI$sp(FileUtils.scala:66)
>         at scala.collection.immutable.Range.foreach$mVc$sp(Range.
> scala:141)
>         at org.apache.spark.util.FileUtils$.getPaths(FileUtils.scala:62)
>         at
> org.apache.spark.sql.execution.command.LoadTableUsingKettle.run(
> carbonTableSchema.scala:1096)
>         at
> org.apache.spark.sql.execution.command.LoadTable.
> run(carbonTableSchema.scala:1036)
>         at
> org.apache.spark.sql.execution.ExecutedCommand.
> sideEffectResult$lzycompute(commands.scala:58)
>         at
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.
> scala:56)
>         at
> org.apache.spark.sql.execution.ExecutedCommand.
> doExecute(commands.scala:70)
>         at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$
> execute$5.apply(SparkPlan.scala:132)
>         at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$
> execute$5.apply(SparkPlan.scala:130)
>         at
> org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:150)
>         at org.apache.spark.sql.execution.SparkPlan.execute(
> SparkPlan.scala:130)
>         at
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(
> QueryExecution.scala:55)
>         at
> org.apache.spark.sql.execution.QueryExecution.
> toRdd(QueryExecution.scala:55)
>         at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
>         at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
>         at
> org.apache.carbondata.spark.rdd.CarbonDataFrameRDD.<init>(
> CarbonDataFrameRDD.scala:23)
>         at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:137)
>         at
> org.apache.carbondata.spark.CarbonDataFrameWriter.loadTempCSV(
> CarbonDataFrameWriter.scala:84)
>         at
> org.apache.carbondata.spark.CarbonDataFrameWriter.writeToCarbonFile(
> CarbonDataFrameWriter.scala:49)
>         at
> org.apache.carbondata.spark.CarbonDataFrameWriter.appendToCarbonFile(
> CarbonDataFrameWriter.scala:42)
>         at
> org.apache.spark.sql.CarbonSource.createRelation(CarbonDatasourceRelation.
> scala:112)
>         at
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(
> ResolvedDataSource.scala:222)
>         at org.apache.spark.sql.DataFrameWriter.save(
> DataFrameWriter.scala:148)
>         at
> cn.com.jldata.app.SaleStoreApp$$anonfun$ToCarbon$1.apply(SaleStoreApp.
> scala:38)
>         at
> cn.com.jldata.app.SaleStoreApp$$anonfun$ToCarbon$1.apply(SaleStoreApp.
> scala:25)
>         at
> org.apache.spark.streaming.dstream.DStream$$anonfun$
> foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
>         at
> org.apache.spark.streaming.dstream.DStream$$anonfun$
> foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$
> anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50)
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$
> anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$
> anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
>         at
> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(
> DStream.scala:426)
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(
> ForEachDStream.scala:49)
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(
> ForEachDStream.scala:49)
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(
> ForEachDStream.scala:49)
>         at scala.util.Try$.apply(Try.scala:161)
>         at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
>         at
> org.apache.spark.streaming.scheduler.JobScheduler$
> JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:227)
>         at
> org.apache.spark.streaming.scheduler.JobScheduler$
> JobHandler$$anonfun$run$1.apply(JobScheduler.scala:227)
>         at
> org.apache.spark.streaming.scheduler.JobScheduler$
> JobHandler$$anonfun$run$1.apply(JobScheduler.scala:227)
>         at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>         at
> org.apache.spark.streaming.scheduler.JobScheduler$
> JobHandler.run(JobScheduler.scala:226)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> INFO  14-12 10:25:09,677 - Deleting batches ArrayBuffer()
> INFO  14-12 10:25:09,679 - Final app status: FAILED, exitCode: 15, (reason:
> User class threw exception:
> org.apache.carbondata.processing.etl.DataLoadingException: The input file
> does not exist: ./tempCSV)
> INFO  14-12 10:25:09,685 - remove old batch metadata:
> INFO  14-12 10:25:09,687 - Invoking stop(stopGracefully=false) from
> shutdown
> hook
> INFO  14-12 10:25:09,690 - Stopping JobGenerator immediately
> INFO  14-12 10:25:09,692 - Stopped timer for JobGenerator after time
> 1481682300000
> INFO  14-12 10:25:09,694 - Stopped JobGenerator
> INFO  14-12 10:25:09,697 - Stopped JobScheduler
> INFO  14-12 10:25:09,704 - stopped
> o.s.j.s.ServletContextHandler{/streaming,null}
> INFO  14-12 10:25:09,706 - stopped
> o.s.j.s.ServletContextHandler{/streaming/batch,null}
> INFO  14-12 10:25:09,709 - stopped
> o.s.j.s.ServletContextHandler{/static/streaming,null}
> INFO  14-12 10:25:09,711 - StreamingContext stopped successfully
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/save-dataframe-
> error-why-loading-TEMPCSV-tp4384.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>



--
Regards
Liang
Reply | Threaded
Open this post in threaded view
|

Re: save dataframe error, why loading ./TEMPCSV ?

Li Peng
Thanks.
    I use carbondata 0.2.0 version now.  
    In the step : Dataframe->csv files->load data to Carbon Table.  I don't know where the csv files is stored?
   the log is:
          LOAD DATA INPATH './TEMPCSV'
          INTO TABLE DEFAULT.SALE
   the INPATH is not found.
   org.apache.carbondata.processing.etl.DataLoadingException: The input file does not exist: ./tempCSV


   
Reply | Threaded
Open this post in threaded view
|

Re: save dataframe error, why loading ./TEMPCSV ?

Liang Chen-2
Hi

tempCSV just is a temp folder, will be deleted after finishing load data to
carbon table.
You can set some breakpoints to debug example DataFrameAPIExample.scala  ,
you will find the temp folder.

Regards
Liang



Regards
Liang

2016-12-14 13:55 GMT+08:00 Li Peng <[hidden email]>:

> Thanks.
>     I use carbondata 0.2.0 version now.
>     In the step : Dataframe->csv files->load data to Carbon Table.  I don't
> know where the csv files is stored?
>    the log is:
>           LOAD DATA INPATH './TEMPCSV'
>           INTO TABLE DEFAULT.SALE
>    the INPATH is not found.
>    org.apache.carbondata.processing.etl.DataLoadingException: The input
> file
> does not exist: ./tempCSV
>
>
>
>
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/save-dataframe-
> error-why-loading-TEMPCSV-tp4384p4392.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>