Apache CarbonData Dev Mailing List archive

InvalidInputException when loading data to table

Classic

List

Threaded

4 messages Options

李寅威

Dec 19, 2016; 12:24pm

InvalidInputException when loading data to table

Hi all,

I'm now learning how to getting started with carbondata according to the tutorial: https://cwiki.apache.org/confluence/display/CARBONDATA/Quick+Start.

I created a file named sample.csv under the path /home/hadoop/carbondata at the master node, and when I run the script:

scala>val dataFilePath = new File("../carbondata/sample.csv").getCanonicalPath
scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table")

it turns out a "InvalidInputException" while the file is acctually exist, here is the scripts and logs:

scala> val dataFilePath = new File("../carbondata/sample.csv").getCanonicalPath
dataFilePath: String = /home/hadoop/carbondata/sample.csv

scala> cc.sql(s"load data inpath '$dataFilePath' into table test_table")
INFO 19-12 20:18:22,991 - main Query [LOAD DATA INPATH '/HOME/HADOOP/CARBONDATA/SAMPLE.CSV' INTO TABLE TEST_TABLE]
INFO 19-12 20:18:23,271 - Successfully able to get the table metadata file lock
INFO 19-12 20:18:23,276 - main Initiating Direct Load for the Table : (default.test_table)
INFO 19-12 20:18:23,279 - main Generate global dictionary from source data files!
INFO 19-12 20:18:23,296 - main [Block Distribution]
INFO 19-12 20:18:23,297 - main totalInputSpaceConsumed: 74 , defaultParallelism: 28
INFO 19-12 20:18:23,297 - main mapreduce.input.fileinputformat.split.maxsize: 16777216
INFO 19-12 20:18:23,380 - Block broadcast_0 stored as values in memory (estimated size 137.1 KB, free 137.1 KB)
INFO 19-12 20:18:23,397 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 15.0 KB, free 152.1 KB)
INFO 19-12 20:18:23,398 - Added broadcast_0_piece0 in memory on 172.17.195.12:46335 (size: 15.0 KB, free: 511.1 MB)
INFO 19-12 20:18:23,399 - Created broadcast 0 from NewHadoopRDD at CarbonTextFile.scala:73
ERROR 19-12 20:18:23,431 - main generate global dictionary failed
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: /home/hadoop/carbondata/sample.csv
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:285)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:340)
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:113)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
...

If any of you have met the same problem, would you tell me why this happen, looking forward to your replay, thx~

Liang Chen

Dec 20, 2016; 12:35am

Re: InvalidInputException when loading data to table

Administrator

Hi

1.Your input path is hadoop or local ? Please double check your input path
if it is correct.
2.As a new starter, suggest you use IntellijIDEA to open CarbonData , and
run all examples.

Regards
Liang
------------------------------
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: /home/hadoop/carbondata/sample.csv
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.
listStatus(FileInputFormat.java:285)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.
getSplits(FileInputFormat.java:340)
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(
NewHadoopRDD.scala:113)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)

2016-12-19 20:24 GMT+08:00 251469031 <[hidden email]>:

> Hi all,
>
> I'm now learning how to getting started with carbondata according to
> the tutorial: https://cwiki.apache.org/confluence/display/CARBONDATA/
> Quick+Start.
>
>
> I created a file named sample.csv under the path
> /home/hadoop/carbondata at the master node, and when I run the script:
>
>
> scala>val dataFilePath = new File("../carbondata/sample.
> csv").getCanonicalPath
> scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table")
>
>
> it turns out a "InvalidInputException" while the file is acctually exist,
> here is the scripts and logs:
>
>
> scala> val dataFilePath = new File("../carbondata/sample.
> csv").getCanonicalPath
> dataFilePath: String = /home/hadoop/carbondata/sample.csv
>
>
> scala> cc.sql(s"load data inpath '$dataFilePath' into table test_table")
> INFO 19-12 20:18:22,991 - main Query [LOAD DATA INPATH
> '/HOME/HADOOP/CARBONDATA/SAMPLE.CSV' INTO TABLE TEST_TABLE]
> INFO 19-12 20:18:23,271 - Successfully able to get the table metadata
> file lock
> INFO 19-12 20:18:23,276 - main Initiating Direct Load for the Table :
> (default.test_table)
> INFO 19-12 20:18:23,279 - main Generate global dictionary from source
> data files!
> INFO 19-12 20:18:23,296 - main [Block Distribution]
> INFO 19-12 20:18:23,297 - main totalInputSpaceConsumed: 74 ,
> defaultParallelism: 28
> INFO 19-12 20:18:23,297 - main mapreduce.input.fileinputformat.split.maxsize:
> 16777216
> INFO 19-12 20:18:23,380 - Block broadcast_0 stored as values in memory
> (estimated size 137.1 KB, free 137.1 KB)
> INFO 19-12 20:18:23,397 - Block broadcast_0_piece0 stored as bytes in
> memory (estimated size 15.0 KB, free 152.1 KB)
> INFO 19-12 20:18:23,398 - Added broadcast_0_piece0 in memory on
> 172.17.195.12:46335 (size: 15.0 KB, free: 511.1 MB)
> INFO 19-12 20:18:23,399 - Created broadcast 0 from NewHadoopRDD at
> CarbonTextFile.scala:73
> ERROR 19-12 20:18:23,431 - main generate global dictionary failed
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> does not exist: /home/hadoop/carbondata/sample.csv
> at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.
> listStatus(FileInputFormat.java:285)
> at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.
> getSplits(FileInputFormat.java:340)
> at org.apache.spark.rdd.NewHadoopRDD.getPartitions(
> NewHadoopRDD.scala:113)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> ...
>
>
> If any of you have met the same problem, would you tell me why this
> happen, looking forward to your replay, thx~

... [show rest of quote]

--
Regards
Liang

李寅威

Dec 20, 2016; 12:56am

回复： InvalidInputException when loading data to table

OK, thx~

It's a local path, well, in the error log, it shows that the dataFilePath is set to /home/hadoop/carbondata/sample.csv, and it is where my test file located. @see the log:

Input path does not exist: /home/hadoop/carbondata/sample.csv

in the following command, is the package of class File is java.io.File?
scala>val dataFilePath = new File("../carbondata/sample.csv").getCanonicalPath

------------------ 原始邮件 ------------------
发件人: "Liang Chen";<[hidden email]>;
发送时间: 2016年12月20日(星期二) 上午8:35
收件人: "dev"<[hidden email]>;

主题: Re: InvalidInputException when loading data to table

Hi

1.Your input path is hadoop or local ? Please double check your input path
if it is correct.
2.As a new starter, suggest you use IntellijIDEA to open CarbonData , and
run all examples.

Regards
Liang
------------------------------
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: /home/hadoop/carbondata/sample.csv
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.
listStatus(FileInputFormat.java:285)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.
getSplits(FileInputFormat.java:340)
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(
NewHadoopRDD.scala:113)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)

2016-12-19 20:24 GMT+08:00 251469031 <[hidden email]>:

... [show rest of quote]

--
Regards
Liang

GiantQ

Dec 21, 2016; 1:43am

Re:Re: InvalidInputException when loading data to table

In reply to this post by Liang Chen

I'v met this problem. As a freshman, I don't know how to configure details. I just keep local path and HDFS path identical.

At 2016-12-20 08:35:22, "Liang Chen [via Apache CarbonData Mailing List archive]" <[hidden email]> wrote:

Hi

1.Your input path is hadoop or local ? Please double check your input path
if it is correct.
2.As a new starter, suggest you use IntellijIDEA to open CarbonData , and
run all examples.

Regards
Liang
------------------------------
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: /home/hadoop/carbondata/sample.csv
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.
listStatus(FileInputFormat.java:285)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.
getSplits(FileInputFormat.java:340)
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(
NewHadoopRDD.scala:113)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)

2016-12-19 20:24 GMT+08:00 251469031 <[hidden email]>:

> Hi all,
>
> I'm now learning how to getting started with carbondata according to
> the tutorial: https://cwiki.apache.org/confluence/display/CARBONDATA/
> Quick+Start.
>
>
> I created a file named sample.csv under the path
> /home/hadoop/carbondata at the master node, and when I run the script:
>
>
> scala>val dataFilePath = new File("../carbondata/sample.
> csv").getCanonicalPath
> scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table")
>
>
> it turns out a "InvalidInputException" while the file is acctually exist,
> here is the scripts and logs:
>
>
> scala> val dataFilePath = new File("../carbondata/sample.
> csv").getCanonicalPath
> dataFilePath: String = /home/hadoop/carbondata/sample.csv
>
>
> scala> cc.sql(s"load data inpath '$dataFilePath' into table test_table")
> INFO 19-12 20:18:22,991 - main Query [LOAD DATA INPATH
> '/HOME/HADOOP/CARBONDATA/SAMPLE.CSV' INTO TABLE TEST_TABLE]
> INFO 19-12 20:18:23,271 - Successfully able to get the table metadata
> file lock
> INFO 19-12 20:18:23,276 - main Initiating Direct Load for the Table :
> (default.test_table)
> INFO 19-12 20:18:23,279 - main Generate global dictionary from source
> data files!
> INFO 19-12 20:18:23,296 - main [Block Distribution]
> INFO 19-12 20:18:23,297 - main totalInputSpaceConsumed: 74 ,
> defaultParallelism: 28
> INFO 19-12 20:18:23,297 - main mapreduce.input.fileinputformat.split.maxsize:
> 16777216
> INFO 19-12 20:18:23,380 - Block broadcast_0 stored as values in memory
> (estimated size 137.1 KB, free 137.1 KB)
> INFO 19-12 20:18:23,397 - Block broadcast_0_piece0 stored as bytes in
> memory (estimated size 15.0 KB, free 152.1 KB)
> INFO 19-12 20:18:23,398 - Added broadcast_0_piece0 in memory on
> 172.17.195.12:46335 (size: 15.0 KB, free: 511.1 MB)
> INFO 19-12 20:18:23,399 - Created broadcast 0 from NewHadoopRDD at
> CarbonTextFile.scala:73
> ERROR 19-12 20:18:23,431 - main generate global dictionary failed
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> does not exist: /home/hadoop/carbondata/sample.csv
> at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.
> listStatus(FileInputFormat.java:285)
> at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.
> getSplits(FileInputFormat.java:340)
> at org.apache.spark.rdd.NewHadoopRDD.getPartitions(
> NewHadoopRDD.scala:113)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> ...
>
>
> If any of you have met the same problem, would you tell me why this
> happen, looking forward to your replay, thx~
... [show rest of quote]
... [show rest of quote]

--
Regards
Liang

If you reply to this email, your message will be added to the discussion below:
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/InvalidInputException-when-loading-data-to-table-tp4673p4709.html

To start a new topic under Apache CarbonData Mailing List archive, email ml-node+[hidden email]
To unsubscribe from Apache CarbonData Mailing List archive, click here.
NAML

... [show rest of quote]