etl.DataLoadingException: The input file does not exist

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

etl.DataLoadingException: The input file does not exist

李寅威
Hi,

when i run the following script:


scala>val dataFilePath = new File("/carbondata/pt/sample.csv").getCanonicalPath
scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table")


is turns out:


org.apache.carbondata.processing.etl.DataLoadingException: The input file does not exist: hdfs://master:9000hdfs://master/opt/data/carbondata/pt/sample.csv
        at org.apache.spark.util.FileUtils$$anonfun$getPaths$1.apply$mcVI$sp(FileUtils.scala:66)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)


It confused me that why there is a string "hdfs://master:9000" before "hdfs://master/opt/data/carbondata/pt/sample.csv", I can't found some configuration that contains "hdfs://master:9000", could any one help me~
Reply | Threaded
Open this post in threaded view
|

Re: etl.DataLoadingException: The input file does not exist

Liang Chen
Administrator
Hi

This is because that you use cluster mode, but the input file is local file.
1.If you use cluster mode, please load hadoop files
2.If you just want to load local files, please use local mode.

李寅威 wrote
Hi,

when i run the following script:


scala>val dataFilePath = new File("/carbondata/pt/sample.csv").getCanonicalPath
scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table")


is turns out:


org.apache.carbondata.processing.etl.DataLoadingException: The input file does not exist: hdfs://master:9000hdfs://master/opt/data/carbondata/pt/sample.csv
        at org.apache.spark.util.FileUtils$$anonfun$getPaths$1.apply$mcVI$sp(FileUtils.scala:66)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)


It confused me that why there is a string "hdfs://master:9000" before "hdfs://master/opt/data/carbondata/pt/sample.csv", I can't found some configuration that contains "hdfs://master:9000", could any one help me~
Reply | Threaded
Open this post in threaded view
|

回复: etl.DataLoadingException: The input file does not exist

李寅威
Well, In the source code of carbondata, the filetype is determined as :


if (property.startsWith(CarbonUtil.HDFS_PREFIX)) {
        storeDefaultFileType = FileType.HDFS;
      }


and  CarbonUtil.HDFS_PREFIX="hdfs://"


but when I run the following script, the dataFilePath is still local:


scala> val dataFilePath = new File("hdfs://master:9000/carbondata/sample.csv").getCanonicalPath
dataFilePath: String = /home/hadoop/carbondata/hdfs:/master:9000/carbondata/sample.csv





------------------ 原始邮件 ------------------
发件人: "Liang Chen";<[hidden email]>;
发送时间: 2016年12月22日(星期四) 晚上8:47
收件人: "dev"<[hidden email]>;

主题: Re: etl.DataLoadingException: The input file does not exist



Hi

This is because that you use cluster mode, but the input file is local file.
1.If you use cluster mode, please load hadoop files
2.If you just want to load local files, please use local mode.


李寅威 wrote

> Hi,
>
> when i run the following script:
>
>
> scala>val dataFilePath = new
> File("/carbondata/pt/sample.csv").getCanonicalPath
> scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table")
>
>
> is turns out:
>
>
> org.apache.carbondata.processing.etl.DataLoadingException: The input file
> does not exist:
> hdfs://master:9000hdfs://master/opt/data/carbondata/pt/sample.csv
> at
> org.apache.spark.util.FileUtils$$anonfun$getPaths$1.apply$mcVI$sp(FileUtils.scala:66)
> at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>
>
> It confused me that why there is a string "hdfs://master:9000" before
> "hdfs://master/opt/data/carbondata/pt/sample.csv", I can't found some
> configuration that contains "hdfs://master:9000", could any one help me~





--
View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/etl-DataLoadingException-The-input-file-does-not-exist-tp4853p4854.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: 回复: etl.DataLoadingException: The input file does not exist

David CaiQiang
Please find the following item in carbon.properties file, give a proper path(hdfs://master:9000/)
carbon.ddl.base.hdfs.url

During loading, will combine this url and data file path.

BTW, better to provide the version number.
Best Regards
David Cai
Reply | Threaded
Open this post in threaded view
|

Re: 回复: etl.DataLoadingException: The input file does not exist

manishgupta88
Hi 251469031,

Thanks for showing interest in carbon. For your question please refer the
explanation below.

scala> val dataFilePath = new File("hdfs://master:9000/
carbondata/sample.csv").getCanonicalPath
dataFilePath: String = /home/hadoop/carbondata/hdfs:/
master:9000/carbondata/sample.csv

If you use new File, it will always return the pointer for path from local
file system. So Incase you are not appending hdfs url to the file/folder
path in the Load data DDL command, you can configure
*carbon.ddl.base.hdfs.url* in carbon.properties file as suggested by
QiangCai.

*carbon.ddl.base.hdfs.url=hdfs://<IP>:<port>*

example
*carbon.ddl.base.hdfs.url=hdfs://9.82.101.42:54310
<http://9.82.101.42:54310>*

Regards
Manish Gupta

On Fri, Dec 23, 2016 at 10:09 AM, QiangCai <[hidden email]> wrote:

> Please find the following item in carbon.properties file, give a proper
> path(hdfs://master:9000/)
> carbon.ddl.base.hdfs.url
>
> During loading, will combine this url and data file path.
>
> BTW, better to provide the version number.
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/etl-DataLoadingException-The-
> input-file-does-not-exist-tp4853p4888.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

回复: etl.DataLoadingException: The input file does not exist

李寅威
Oh I see, I've solved it, thx very much to Manish & QiangCai~~


here is my dml script:
cc.sql(s"load data inpath 'hdfs://master:9000/carbondata/pt/sample.csv' into table test_table")
 




------------------ 原始邮件 ------------------
发件人: "manish gupta";<[hidden email]>;
发送时间: 2016年12月23日(星期五) 下午2:32
收件人: "dev"<[hidden email]>;

主题: Re: 回复: etl.DataLoadingException: The input file does not exist



Hi 251469031,

Thanks for showing interest in carbon. For your question please refer the
explanation below.

scala> val dataFilePath = new File("hdfs://master:9000/
carbondata/sample.csv").getCanonicalPath
dataFilePath: String = /home/hadoop/carbondata/hdfs:/
master:9000/carbondata/sample.csv

If you use new File, it will always return the pointer for path from local
file system. So Incase you are not appending hdfs url to the file/folder
path in the Load data DDL command, you can configure
*carbon.ddl.base.hdfs.url* in carbon.properties file as suggested by
QiangCai.

*carbon.ddl.base.hdfs.url=hdfs://<IP>:<port>*

example
*carbon.ddl.base.hdfs.url=hdfs://9.82.101.42:54310
<http://9.82.101.42:54310>*

Regards
Manish Gupta

On Fri, Dec 23, 2016 at 10:09 AM, QiangCai <[hidden email]> wrote:

> Please find the following item in carbon.properties file, give a proper
> path(hdfs://master:9000/)
> carbon.ddl.base.hdfs.url
>
> During loading, will combine this url and data file path.
>
> BTW, better to provide the version number.
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/etl-DataLoadingException-The-
> input-file-does-not-exist-tp4853p4888.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>