Apache CarbonData Dev Mailing List archive

etl.DataLoadingException: The input file does not exist

Classic

List

Threaded

6 messages Options

李寅威

etl.DataLoadingException: The input file does not exist

Hi,

when i run the following script:

scala>val dataFilePath = new File("/carbondata/pt/sample.csv").getCanonicalPath
scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table")

is turns out:

org.apache.carbondata.processing.etl.DataLoadingException: The input file does not exist: hdfs://master:9000hdfs://master/opt/data/carbondata/pt/sample.csv
at org.apache.spark.util.FileUtils$$anonfun$getPaths$1.apply$mcVI$sp(FileUtils.scala:66)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)

It confused me that why there is a string "hdfs://master:9000" before "hdfs://master/opt/data/carbondata/pt/sample.csv", I can't found some configuration that contains "hdfs://master:9000", could any one help me~

Liang Chen

Re: etl.DataLoadingException: The input file does not exist

Administrator

Hi

This is because that you use cluster mode, but the input file is local file.
1.If you use cluster mode, please load hadoop files
2.If you just want to load local files, please use local mode.

李寅威 wrote

Hi,

when i run the following script:

scala>val dataFilePath = new File("/carbondata/pt/sample.csv").getCanonicalPath
scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table")

is turns out:

org.apache.carbondata.processing.etl.DataLoadingException: The input file does not exist: hdfs://master:9000hdfs://master/opt/data/carbondata/pt/sample.csv
at org.apache.spark.util.FileUtils$$anonfun$getPaths$1.apply$mcVI$sp(FileUtils.scala:66)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)

It confused me that why there is a string "hdfs://master:9000" before "hdfs://master/opt/data/carbondata/pt/sample.csv", I can't found some configuration that contains "hdfs://master:9000", could any one help me~

李寅威

回复： etl.DataLoadingException: The input file does not exist

Well, In the source code of carbondata, the filetype is determined as :

if (property.startsWith(CarbonUtil.HDFS_PREFIX)) {
storeDefaultFileType = FileType.HDFS;
}

and CarbonUtil.HDFS_PREFIX="hdfs://"

but when I run the following script, the dataFilePath is still local:

scala> val dataFilePath = new File("hdfs://master:9000/carbondata/sample.csv").getCanonicalPath
dataFilePath: String = /home/hadoop/carbondata/hdfs:/master:9000/carbondata/sample.csv

------------------ 原始邮件 ------------------
发件人: "Liang Chen";<[hidden email]>;
发送时间: 2016年12月22日(星期四) 晚上8:47
收件人: "dev"<[hidden email]>;

主题: Re: etl.DataLoadingException: The input file does not exist

Hi

This is because that you use cluster mode, but the input file is local file.
1.If you use cluster mode, please load hadoop files
2.If you just want to load local files, please use local mode.

李寅威 wrote

> Hi,
>
> when i run the following script:
>
>
> scala>val dataFilePath = new
> File("/carbondata/pt/sample.csv").getCanonicalPath
> scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table")
>
>
> is turns out:
>
>
> org.apache.carbondata.processing.etl.DataLoadingException: The input file
> does not exist:
> hdfs://master:9000hdfs://master/opt/data/carbondata/pt/sample.csv
> at
> org.apache.spark.util.FileUtils$$anonfun$getPaths$1.apply$mcVI$sp(FileUtils.scala:66)
> at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>
>
> It confused me that why there is a string "hdfs://master:9000" before
> "hdfs://master/opt/data/carbondata/pt/sample.csv", I can't found some
> configuration that contains "hdfs://master:9000", could any one help me~

--
View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/etl-DataLoadingException-The-input-file-does-not-exist-tp4853p4854.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.

David CaiQiang

Re: 回复： etl.DataLoadingException: The input file does not exist

Please find the following item in carbon.properties file, give a proper path(hdfs://master:9000/)
carbon.ddl.base.hdfs.url

During loading, will combine this url and data file path.

BTW, better to provide the version number.

Best Regards
David Cai

manishgupta88

Re: 回复： etl.DataLoadingException: The input file does not exist

Hi 251469031,

Thanks for showing interest in carbon. For your question please refer the
explanation below.

scala> val dataFilePath = new File("hdfs://master:9000/
carbondata/sample.csv").getCanonicalPath
dataFilePath: String = /home/hadoop/carbondata/hdfs:/
master:9000/carbondata/sample.csv

If you use new File, it will always return the pointer for path from local
file system. So Incase you are not appending hdfs url to the file/folder
path in the Load data DDL command, you can configure
*carbon.ddl.base.hdfs.url* in carbon.properties file as suggested by
QiangCai.

*carbon.ddl.base.hdfs.url=hdfs://<IP>:<port>*

example
*carbon.ddl.base.hdfs.url=hdfs://9.82.101.42:54310
<http://9.82.101.42:54310>*

Regards
Manish Gupta

On Fri, Dec 23, 2016 at 10:09 AM, QiangCai <[hidden email]> wrote:

> Please find the following item in carbon.properties file, give a proper
> path(hdfs://master:9000/)
> carbon.ddl.base.hdfs.url
>
> During loading, will combine this url and data file path.
>
> BTW, better to provide the version number.
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/etl-DataLoadingException-The-
> input-file-does-not-exist-tp4853p4888.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>

李寅威

回复： etl.DataLoadingException: The input file does not exist

Oh I see, I've solved it, thx very much to Manish & QiangCai~~

here is my dml script:
cc.sql(s"load data inpath 'hdfs://master:9000/carbondata/pt/sample.csv' into table test_table")

------------------ 原始邮件 ------------------
发件人: "manish gupta";<[hidden email]>;
发送时间: 2016年12月23日(星期五) 下午2:32
收件人: "dev"<[hidden email]>;

主题: Re: 回复： etl.DataLoadingException: The input file does not exist

Hi 251469031,

Thanks for showing interest in carbon. For your question please refer the
explanation below.

scala> val dataFilePath = new File("hdfs://master:9000/
carbondata/sample.csv").getCanonicalPath
dataFilePath: String = /home/hadoop/carbondata/hdfs:/
master:9000/carbondata/sample.csv

If you use new File, it will always return the pointer for path from local
file system. So Incase you are not appending hdfs url to the file/folder
path in the Load data DDL command, you can configure
*carbon.ddl.base.hdfs.url* in carbon.properties file as suggested by
QiangCai.

*carbon.ddl.base.hdfs.url=hdfs://<IP>:<port>*

example
*carbon.ddl.base.hdfs.url=hdfs://9.82.101.42:54310
<http://9.82.101.42:54310>*

Regards
Manish Gupta

On Fri, Dec 23, 2016 at 10:09 AM, QiangCai <[hidden email]> wrote: