Posted by
李寅威 on
Jan 10, 2017; 6:44am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/load-data-error-from-csv-file-at-hdfs-error-in-standalone-spark-cluster-tp5783p5859.html
thx 陈亮总。
I've solved the problem, here is my record:
first,
I found the spark job failed when loading data and there is an error "CarbonDataWriterException: Problem while copying file from local store to carbon store", when located to the source code at ./processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter, it shows:
private void copyCarbonDataFileToCarbonStorePath(String localFileName)
throws CarbonDataWriterException {
long copyStartTime = System.currentTimeMillis();
LOGGER.info("Copying " + localFileName + " --> " + dataWriterVo.getCarbonDataDirectoryPath());
try {
CarbonFile localCarbonFile =
FileFactory.getCarbonFile(localFileName, FileFactory.getFileType(localFileName));
String carbonFilePath = dataWriterVo.getCarbonDataDirectoryPath() + localFileName
.substring(localFileName.lastIndexOf(File.separator));
copyLocalFileToCarbonStore(carbonFilePath, localFileName,
CarbonCommonConstants.BYTEBUFFER_SIZE,
getMaxOfBlockAndFileSize(fileSizeInBytes, localCarbonFile.getSize()));
} catch (IOException e) {
throw new CarbonDataWriterException(
"Problem while copying file from local store to carbon store");
}
LOGGER.info(
"Total copy time (ms) to copy file " + localFileName + " is " + (System.currentTimeMillis()
- copyStartTime));
}
the main reason is that the method copyLocalFileToCarbonStore cause an IOException, but the catch block doesn't tell me what is the real reason that coused the error(at this moment, I really like technical logs more then business logs). so I add a line of code:
...
catch (IOException e) {
LOGGER.info("-------------------logs print by liyinwei start---------------------");
LOGGER.error(e, "");
LOGGER.info("-------------------logs print by liyinwei end ---------------------");
throw new CarbonDataWriterException(
"Problem while copying file from local store to carbon store");
then I rebuild the source code and it logs as follows:
INFO 10-01 10:29:59,546 - [test_table: Graph - MDKeyGentest_table][partitionID:0] -------------------logs print by liyinwei start---------------------
ERROR 10-01 10:29:59,547 - [test_table: Graph - MDKeyGentest_table][partitionID:0]
java.io.FileNotFoundException: /home/hadoop/carbondata/bin/carbonshellstore/default/test_table/Fact/Part0/Segment_0/part-0-0-1484015398000.carbondata (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
...
INFO 10-01 10:29:59,547 - [test_table: Graph - MDKeyGentest_table][partitionID:0] -------------------logs print by liyinwei end ---------------------
ERROR 10-01 10:29:59,547 - [test_table: Graph - MDKeyGentest_table][partitionID:0] Problem while copying file from local store to carbon store
second,
as u see, the main reason that cause the error is a FileNotFoundException, which means the metadata is not found. with the help of Liang Chen & Brave heart, I found that the default of carbondata storePath is as below if we start the spark-shell by using carbon-spark-shell:
scala> print(cc.storePath)
/home/hadoop/carbondata/bin/carbonshellstore
so I added a parameter when starting carbon-spark-shell:
./bin/carbon-spark-shell --conf spark.carbon.storepath=hdfs://master:9000/home/hadoop/carbondata/bin/carbonshellstore
and then print the storePath:
scala> print(cc.storePath)
hdfs://master:9000/home/hadoop/carbondata/bin/carbonshellstore
finally,
I run the command
cc.sql(s"load data inpath 'hdfs://master:9000/home/hadoop/sample.csv' into table test_table")
again and it success, which follows:
cc.sql("select * from test_table").show
------------------ Original ------------------
From: "Liang Chen";<
[hidden email]>;
Date: Tue, Jan 10, 2017 12:11 PM
To: "dev"<
[hidden email]>;
Subject: Re: Problem while copying file from local store to carbon store
Hi
Please use spark-shell to create carboncontext, you can refer to these
articles :
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=67635497Regards
Liang
--
View this message in context:
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/load-data-error-from-csv-file-at-hdfs-error-in-standalone-spark-cluster-tp5783p5844.htmlSent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.