Hi all,
I'm now learning how to getting started with carbondata according to the tutorial: https://cwiki.apache.org/confluence/display/CARBONDATA/Quick+Start. I created a file named sample.csv under the path /home/hadoop/carbondata at the master node, and when I run the script: scala>val dataFilePath = new File("../carbondata/sample.csv").getCanonicalPath scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table") it turns out a "InvalidInputException" while the file is acctually exist, here is the scripts and logs: scala> val dataFilePath = new File("../carbondata/sample.csv").getCanonicalPath dataFilePath: String = /home/hadoop/carbondata/sample.csv scala> cc.sql(s"load data inpath '$dataFilePath' into table test_table") INFO 19-12 20:18:22,991 - main Query [LOAD DATA INPATH '/HOME/HADOOP/CARBONDATA/SAMPLE.CSV' INTO TABLE TEST_TABLE] INFO 19-12 20:18:23,271 - Successfully able to get the table metadata file lock INFO 19-12 20:18:23,276 - main Initiating Direct Load for the Table : (default.test_table) INFO 19-12 20:18:23,279 - main Generate global dictionary from source data files! INFO 19-12 20:18:23,296 - main [Block Distribution] INFO 19-12 20:18:23,297 - main totalInputSpaceConsumed: 74 , defaultParallelism: 28 INFO 19-12 20:18:23,297 - main mapreduce.input.fileinputformat.split.maxsize: 16777216 INFO 19-12 20:18:23,380 - Block broadcast_0 stored as values in memory (estimated size 137.1 KB, free 137.1 KB) INFO 19-12 20:18:23,397 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 15.0 KB, free 152.1 KB) INFO 19-12 20:18:23,398 - Added broadcast_0_piece0 in memory on 172.17.195.12:46335 (size: 15.0 KB, free: 511.1 MB) INFO 19-12 20:18:23,399 - Created broadcast 0 from NewHadoopRDD at CarbonTextFile.scala:73 ERROR 19-12 20:18:23,431 - main generate global dictionary failed org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: /home/hadoop/carbondata/sample.csv at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:285) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:340) at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:113) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) ... If any of you have met the same problem, would you tell me why this happen, looking forward to your replay, thx~ |
Administrator
|
Hi
1.Your input path is hadoop or local ? Please double check your input path if it is correct. 2.As a new starter, suggest you use IntellijIDEA to open CarbonData , and run all examples. Regards Liang ------------------------------ org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: /home/hadoop/carbondata/sample.csv at org.apache.hadoop.mapreduce.lib.input.FileInputFormat. listStatus(FileInputFormat.java:285) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat. getSplits(FileInputFormat.java:340) at org.apache.spark.rdd.NewHadoopRDD.getPartitions( NewHadoopRDD.scala:113) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply( RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply( RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) 2016-12-19 20:24 GMT+08:00 251469031 <[hidden email]>: > Hi all, > > I'm now learning how to getting started with carbondata according to > the tutorial: https://cwiki.apache.org/confluence/display/CARBONDATA/ > Quick+Start. > > > I created a file named sample.csv under the path > /home/hadoop/carbondata at the master node, and when I run the script: > > > scala>val dataFilePath = new File("../carbondata/sample. > csv").getCanonicalPath > scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table") > > > it turns out a "InvalidInputException" while the file is acctually exist, > here is the scripts and logs: > > > scala> val dataFilePath = new File("../carbondata/sample. > csv").getCanonicalPath > dataFilePath: String = /home/hadoop/carbondata/sample.csv > > > scala> cc.sql(s"load data inpath '$dataFilePath' into table test_table") > INFO 19-12 20:18:22,991 - main Query [LOAD DATA INPATH > '/HOME/HADOOP/CARBONDATA/SAMPLE.CSV' INTO TABLE TEST_TABLE] > INFO 19-12 20:18:23,271 - Successfully able to get the table metadata > file lock > INFO 19-12 20:18:23,276 - main Initiating Direct Load for the Table : > (default.test_table) > INFO 19-12 20:18:23,279 - main Generate global dictionary from source > data files! > INFO 19-12 20:18:23,296 - main [Block Distribution] > INFO 19-12 20:18:23,297 - main totalInputSpaceConsumed: 74 , > defaultParallelism: 28 > INFO 19-12 20:18:23,297 - main mapreduce.input.fileinputformat.split.maxsize: > 16777216 > INFO 19-12 20:18:23,380 - Block broadcast_0 stored as values in memory > (estimated size 137.1 KB, free 137.1 KB) > INFO 19-12 20:18:23,397 - Block broadcast_0_piece0 stored as bytes in > memory (estimated size 15.0 KB, free 152.1 KB) > INFO 19-12 20:18:23,398 - Added broadcast_0_piece0 in memory on > 172.17.195.12:46335 (size: 15.0 KB, free: 511.1 MB) > INFO 19-12 20:18:23,399 - Created broadcast 0 from NewHadoopRDD at > CarbonTextFile.scala:73 > ERROR 19-12 20:18:23,431 - main generate global dictionary failed > org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path > does not exist: /home/hadoop/carbondata/sample.csv > at org.apache.hadoop.mapreduce.lib.input.FileInputFormat. > listStatus(FileInputFormat.java:285) > at org.apache.hadoop.mapreduce.lib.input.FileInputFormat. > getSplits(FileInputFormat.java:340) > at org.apache.spark.rdd.NewHadoopRDD.getPartitions( > NewHadoopRDD.scala:113) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply( > RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply( > RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > ... > > > If any of you have met the same problem, would you tell me why this > happen, looking forward to your replay, thx~ -- Regards Liang |
OK, thx~
It's a local path, well, in the error log, it shows that the dataFilePath is set to /home/hadoop/carbondata/sample.csv, and it is where my test file located. @see the log: Input path does not exist: /home/hadoop/carbondata/sample.csv in the following command, is the package of class File is java.io.File? scala>val dataFilePath = new File("../carbondata/sample.csv").getCanonicalPath ------------------ 原始邮件 ------------------ 发件人: "Liang Chen";<[hidden email]>; 发送时间: 2016年12月20日(星期二) 上午8:35 收件人: "dev"<[hidden email]>; 主题: Re: InvalidInputException when loading data to table Hi 1.Your input path is hadoop or local ? Please double check your input path if it is correct. 2.As a new starter, suggest you use IntellijIDEA to open CarbonData , and run all examples. Regards Liang ------------------------------ org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: /home/hadoop/carbondata/sample.csv at org.apache.hadoop.mapreduce.lib.input.FileInputFormat. listStatus(FileInputFormat.java:285) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat. getSplits(FileInputFormat.java:340) at org.apache.spark.rdd.NewHadoopRDD.getPartitions( NewHadoopRDD.scala:113) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply( RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply( RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) 2016-12-19 20:24 GMT+08:00 251469031 <[hidden email]>: > Hi all, > > I'm now learning how to getting started with carbondata according to > the tutorial: https://cwiki.apache.org/confluence/display/CARBONDATA/ > Quick+Start. > > > I created a file named sample.csv under the path > /home/hadoop/carbondata at the master node, and when I run the script: > > > scala>val dataFilePath = new File("../carbondata/sample. > csv").getCanonicalPath > scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table") > > > it turns out a "InvalidInputException" while the file is acctually exist, > here is the scripts and logs: > > > scala> val dataFilePath = new File("../carbondata/sample. > csv").getCanonicalPath > dataFilePath: String = /home/hadoop/carbondata/sample.csv > > > scala> cc.sql(s"load data inpath '$dataFilePath' into table test_table") > INFO 19-12 20:18:22,991 - main Query [LOAD DATA INPATH > '/HOME/HADOOP/CARBONDATA/SAMPLE.CSV' INTO TABLE TEST_TABLE] > INFO 19-12 20:18:23,271 - Successfully able to get the table metadata > file lock > INFO 19-12 20:18:23,276 - main Initiating Direct Load for the Table : > (default.test_table) > INFO 19-12 20:18:23,279 - main Generate global dictionary from source > data files! > INFO 19-12 20:18:23,296 - main [Block Distribution] > INFO 19-12 20:18:23,297 - main totalInputSpaceConsumed: 74 , > defaultParallelism: 28 > INFO 19-12 20:18:23,297 - main mapreduce.input.fileinputformat.split.maxsize: > 16777216 > INFO 19-12 20:18:23,380 - Block broadcast_0 stored as values in memory > (estimated size 137.1 KB, free 137.1 KB) > INFO 19-12 20:18:23,397 - Block broadcast_0_piece0 stored as bytes in > memory (estimated size 15.0 KB, free 152.1 KB) > INFO 19-12 20:18:23,398 - Added broadcast_0_piece0 in memory on > 172.17.195.12:46335 (size: 15.0 KB, free: 511.1 MB) > INFO 19-12 20:18:23,399 - Created broadcast 0 from NewHadoopRDD at > CarbonTextFile.scala:73 > ERROR 19-12 20:18:23,431 - main generate global dictionary failed > org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path > does not exist: /home/hadoop/carbondata/sample.csv > at org.apache.hadoop.mapreduce.lib.input.FileInputFormat. > listStatus(FileInputFormat.java:285) > at org.apache.hadoop.mapreduce.lib.input.FileInputFormat. > getSplits(FileInputFormat.java:340) > at org.apache.spark.rdd.NewHadoopRDD.getPartitions( > NewHadoopRDD.scala:113) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply( > RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply( > RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > ... > > > If any of you have met the same problem, would you tell me why this > happen, looking forward to your replay, thx~ -- Regards Liang |
In reply to this post by Liang Chen
I'v met this problem. As a freshman, I don't know how to configure details. I just keep local path and HDFS path identical. At 2016-12-20 08:35:22, "Liang Chen [via Apache CarbonData Mailing List archive]" <[hidden email]> wrote: Hi
|
Free forum by Nabble | Edit this page |