Hi all,
when I load data from hdfs csv file, a stage of spark job failed with the following error, where can I find a more detail error that can help me find the solution, or may some one know why this happen and how to solve it. command: cc.sql(s"load data inpath 'hdfs://master:9000/opt/sample.csv' into table test_table") error log: Job aborted due to stage failure: Task 0 in stage 7.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7.0 (TID 17, slave2): org.apache.carbondata.processing.etl.DataLoadingException: Due to internal errors, please check logs for more details. Job aborted due to stage failure: Task 0 in stage 7.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7.0 (TID 17, slave2): org.apache.carbondata.processing.etl.DataLoadingException: Due to internal errors, please check logs for more details. at org.apache.carbondata.processing.csvload.DataGraphExecuter.execute(DataGraphExecuter.java:212) at org.apache.carbondata.processing.csvload.DataGraphExecuter.executeGraph(DataGraphExecuter.java:144) at org.apache.carbondata.spark.load.CarbonLoaderUtil.executeGraph(CarbonLoaderUtil.java:212) at org.apache.carbondata.spark.rdd.SparkPartitionLoader.run(CarbonDataLoadRDD.scala:125) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD$$anon$1.<init>(CarbonDataLoadRDD.scala:255) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD.compute(CarbonDataLoadRDD.scala:232) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: |
Hi all:
when I load data from hdfs to a table: cc.sql(s"load data inpath 'hdfs://master:9000/home/hadoop/sample.csv' into table test_table") two errors occured, at slave1: INFO 09-01 16:17:58,611 - test_table: Graph - CSV Input *****************Started all csv reading*********** INFO 09-01 16:17:58,611 - [pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] *****************started csv reading by thread*********** INFO 09-01 16:17:58,635 - [pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] Total Number of records processed by this thread is: 3 INFO 09-01 16:17:58,635 - [pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] Time taken to processed 3 Number of records: 24 INFO 09-01 16:17:58,636 - [pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] *****************Completed csv reading by thread*********** INFO 09-01 16:17:58,636 - test_table: Graph - CSV Input *****************Completed all csv reading*********** INFO 09-01 16:17:58,642 - [test_table: Graph - Carbon Surrogate Key Generator][partitionID:0] Column cache size not configured. Therefore default behavior will be considered and no LRU based eviction of columns will be done ERROR 09-01 16:17:58,645 - [test_table: Graph - Carbon Surrogate Key Generator][partitionID:0] org.apache.carbondata.core.util.CarbonUtilException: Either dictionary or its metadata does not exist for column identifier :: ColumnIdentifier [columnId=c70480f9-4336-4186-8bd0-a3bebb50ea6a] ERROR 09-01 16:17:58,646 - [test_table: Graph - Carbon Surrogate Key Generator][partitionID:0] org.pentaho.di.core.exception.KettleException: org.apache.carbondata.core.util.CarbonUtilException: Either dictionary or its metadata does not exist for column identifier :: ColumnIdentifier [columnId=c70480f9-4336-4186-8bd0-a3bebb50ea6a] at org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.FileStoreSurrogateKeyGenForCSV.initDictionaryCacheInfo(FileStoreSurrogateKeyGenForCSV.java:297) at org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.FileStoreSurrogateKeyGenForCSV.populateCache(FileStoreSurrogateKeyGenForCSV.java:270) at org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.FileStoreSurrogateKeyGenForCSV.<init>(FileStoreSurrogateKeyGenForCSV.java:144) at org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedSeqGenStep.processRow(CarbonCSVBasedSeqGenStep.java:385) at org.pentaho.di.trans.step.RunThread.run(RunThread.java:50) at java.lang.Thread.run(Thread.java:745) INFO 09-01 16:17:58,647 - [test_table: Graph - Carbon Slice Mergertest_table][partitionID:table] Record Procerssed For table: test_table INFO 09-01 16:17:58,647 - [test_table: Graph - Carbon Slice Mergertest_table][partitionID:table] Summary: Carbon Slice Merger Step: Read: 0: Write: 0 INFO 09-01 16:17:58,647 - [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] Record Processed For table: test_table INFO 09-01 16:17:58,647 - [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] Number of Records was Zero INFO 09-01 16:17:58,647 - [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] Summary: Carbon Sort Key Step: Read: 0: Write: 0 INFO 09-01 16:17:58,747 - [Executor task launch worker-0][partitionID:default_test_table_632e80a6-77ef-44b2-aed7-2e5bbf56610e] Graph execution is finished. ERROR 09-01 16:17:58,748 - [Executor task launch worker-0][partitionID:default_test_table_632e80a6-77ef-44b2-aed7-2e5bbf56610e] Graph Execution had errors INFO 09-01 16:17:58,749 - [Executor task launch worker-0][partitionID:default_test_table_632e80a6-77ef-44b2-aed7-2e5bbf56610e] Deleted the local store location/tmp/259202084415620/0 INFO 09-01 16:17:58,749 - DataLoad complete INFO 09-01 16:17:58,749 - Data Loaded successfully with LoadCount:0 INFO 09-01 16:17:58,749 - DataLoad failure ERROR 09-01 16:17:58,749 - [Executor task launch worker-0][partitionID:default_test_table_632e80a6-77ef-44b2-aed7-2e5bbf56610e] org.apache.carbondata.processing.etl.DataLoadingException: Due to internal errors, please check logs for more details. at org.apache.carbondata.processing.csvload.DataGraphExecuter.execute(DataGraphExecuter.java:212) at org.apache.carbondata.processing.csvload.DataGraphExecuter.executeGraph(DataGraphExecuter.java:144) at org.apache.carbondata.spark.load.CarbonLoaderUtil.executeGraph(CarbonLoaderUtil.java:212) at org.apache.carbondata.spark.rdd.SparkPartitionLoader.run(CarbonDataLoadRDD.scala:125) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD$$anon$1.<init>(CarbonDataLoadRDD.scala:255) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD.compute(CarbonDataLoadRDD.scala:232) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ERROR 09-01 16:17:58,752 - Exception in task 0.3 in stage 3.0 (TID 8) org.apache.carbondata.processing.etl.DataLoadingException: Due to internal errors, please check logs for more details. at org.apache.carbondata.processing.csvload.DataGraphExecuter.execute(DataGraphExecuter.java:212) at org.apache.carbondata.processing.csvload.DataGraphExecuter.executeGraph(DataGraphExecuter.java:144) at org.apache.carbondata.spark.load.CarbonLoaderUtil.executeGraph(CarbonLoaderUtil.java:212) at org.apache.carbondata.spark.rdd.SparkPartitionLoader.run(CarbonDataLoadRDD.scala:125) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD$$anon$1.<init>(CarbonDataLoadRDD.scala:255) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD.compute(CarbonDataLoadRDD.scala:232) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at slave2: INFO 09-01 16:17:55,182 - [test_table: Graph - MDKeyGentest_table][partitionID:0] Copying /tmp/259188927254235/0/default/test_table/Fact/Part0/Segment_0/0/part-0-0-1483949874000.carbondata --> /home/hadoop/carbondata/bin/carbonshellstore/default/test_table/Fact/Part0/Segment_0 INFO 09-01 16:17:55,182 - [test_table: Graph - MDKeyGentest_table][partitionID:0] The configured block size is 1024 MB, the actual carbon file size is 921 Byte, choose the max value 1024 MB as the block size on HDFS ERROR 09-01 16:17:55,183 - [test_table: Graph - MDKeyGentest_table][partitionID:0] Problem while copying file from local store to carbon store org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException: Problem while copying file from local store to carbon store at org.apache.carbondata.processing.store.writer.AbstractFactDataWriter.copyCarbonDataFileToCarbonStorePath(AbstractFactDataWriter.java:604) at org.apache.carbondata.processing.store.writer.AbstractFactDataWriter.closeWriter(AbstractFactDataWriter.java:510) at org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.closeHandler(CarbonFactDataHandlerColumnar.java:879) at org.apache.carbondata.processing.mdkeygen.MDKeyGenStep.processingComplete(MDKeyGenStep.java:245) at org.apache.carbondata.processing.mdkeygen.MDKeyGenStep.processRow(MDKeyGenStep.java:234) at org.pentaho.di.trans.step.RunThread.run(RunThread.java:50) at java.lang.Thread.run(Thread.java:745) INFO 09-01 16:17:55,184 - [test_table: Graph - Carbon Slice Mergertest_table][partitionID:table] Record Procerssed For table: test_table INFO 09-01 16:17:55,184 - [test_table: Graph - Carbon Slice Mergertest_table][partitionID:table] Summary: Carbon Slice Merger Step: Read: 1: Write: 0 INFO 09-01 16:17:55,284 - [Executor task launch worker-0][partitionID:default_test_table_c3017cd2-8920-488d-a715-c0d02250148e] Graph execution is finished. ERROR 09-01 16:17:55,284 - [Executor task launch worker-0][partitionID:default_test_table_c3017cd2-8920-488d-a715-c0d02250148e] Graph Execution had errors INFO 09-01 16:17:55,285 - [Executor task launch worker-0][partitionID:default_test_table_c3017cd2-8920-488d-a715-c0d02250148e] Deleted the local store location/tmp/259188927254235/0 INFO 09-01 16:17:55,285 - DataLoad complete INFO 09-01 16:17:55,286 - Data Loaded successfully with LoadCount:0 INFO 09-01 16:17:55,286 - DataLoad failure ERROR 09-01 16:17:55,286 - [Executor task launch worker-0][partitionID:default_test_table_c3017cd2-8920-488d-a715-c0d02250148e] org.apache.carbondata.processing.etl.DataLoadingException: Due to internal errors, please check logs for more details. at org.apache.carbondata.processing.csvload.DataGraphExecuter.execute(DataGraphExecuter.java:212) at org.apache.carbondata.processing.csvload.DataGraphExecuter.executeGraph(DataGraphExecuter.java:144) at org.apache.carbondata.spark.load.CarbonLoaderUtil.executeGraph(CarbonLoaderUtil.java:212) at org.apache.carbondata.spark.rdd.SparkPartitionLoader.run(CarbonDataLoadRDD.scala:125) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD$$anon$1.<init>(CarbonDataLoadRDD.scala:255) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD.compute(CarbonDataLoadRDD.scala:232) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ERROR 09-01 16:17:55,288 - Exception in task 0.0 in stage 3.0 (TID 5) org.apache.carbondata.processing.etl.DataLoadingException: Due to internal errors, please check logs for more details. at org.apache.carbondata.processing.csvload.DataGraphExecuter.execute(DataGraphExecuter.java:212) at org.apache.carbondata.processing.csvload.DataGraphExecuter.executeGraph(DataGraphExecuter.java:144) at org.apache.carbondata.spark.load.CarbonLoaderUtil.executeGraph(CarbonLoaderUtil.java:212) at org.apache.carbondata.spark.rdd.SparkPartitionLoader.run(CarbonDataLoadRDD.scala:125) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD$$anon$1.<init>(CarbonDataLoadRDD.scala:255) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD.compute(CarbonDataLoadRDD.scala:232) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) INFO 09-01 16:17:55,926 - Got assigned task 7 INFO 09-01 16:17:55,926 - Running task 0.2 in stage 3.0 (TID 7) INFO 09-01 16:17:55,930 - Input split: slave2 INFO 09-01 16:17:55,930 - The Block Count in this node :1 INFO 09-01 16:17:55,931 - [Executor task launch worker-0][partitionID:default_test_table_fa5212b0-3e3c-43e1-ae5e-27396dce020c] ************* Is Columnar Storagetrue INFO 09-01 16:17:56,011 - [Executor task launch worker-0][partitionID:default_test_table_fa5212b0-3e3c-43e1-ae5e-27396dce020c] Kettle environment initialized INFO 09-01 16:17:56,027 - [Executor task launch worker-0][partitionID:default_test_table_fa5212b0-3e3c-43e1-ae5e-27396dce020c] ** Using csv file ** INFO 09-01 16:17:56,035 - [Executor task launch worker-0][partitionID:default_test_table_fa5212b0-3e3c-43e1-ae5e-27396dce020c] Graph execution is started /tmp/259190107897964/0/etl/default/test_table/0/0/test_table.ktr INFO 09-01 16:17:56,035 - test_table: Graph - CSV Input *****************Started all csv reading*********** INFO 09-01 16:17:56,035 - [pool-31-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-31-thread-1] *****************started csv reading by thread*********** INFO 09-01 16:17:56,040 - [pool-31-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-31-thread-1] Total Number of records processed by this thread is: 3 INFO 09-01 16:17:56,041 - [pool-31-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-31-thread-1] Time taken to processed 3 Number of records: 6 INFO 09-01 16:17:56,041 - [pool-31-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-31-thread-1] *****************Completed csv reading by thread*********** INFO 09-01 16:17:56,041 - test_table: Graph - CSV Input *****************Completed all csv reading*********** INFO 09-01 16:17:56,043 - [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] Sort size for table: 500000 INFO 09-01 16:17:56,043 - [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] Number of intermediate file to be merged: 20 INFO 09-01 16:17:56,043 - [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] File Buffer Size: 1048576 INFO 09-01 16:17:56,043 - [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] temp file location/tmp/259190107897964/0/default/test_table/Fact/Part0/Segment_0/0/sortrowtmp INFO 09-01 16:17:56,046 - [test_table: Graph - Carbon Surrogate Key Generator][partitionID:0] Level cardinality file written to : /tmp/259190107897964/0/default/test_table/Fact/Part0/Segment_0/0/levelmetadata_test_table.metadata INFO 09-01 16:17:56,046 - [test_table: Graph - Carbon Surrogate Key Generator][partitionID:0] Record Procerssed For table: test_table INFO 09-01 16:17:56,047 - [test_table: Graph - Carbon Surrogate Key Generator][partitionID:0] Summary: Carbon CSV Based Seq Gen Step : 3: Write: 3 INFO 09-01 16:17:56,049 - [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] File based sorting will be used INFO 09-01 16:17:56,049 - [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] Record Processed For table: test_table it seems as IOException, the source code is as follows: /** * This method will copy the given file to carbon store location * * @param localFileName local file name with full path * @throws CarbonDataWriterException */ private void copyCarbonDataFileToCarbonStorePath(String localFileName) throws CarbonDataWriterException { long copyStartTime = System.currentTimeMillis(); LOGGER.info("Copying " + localFileName + " --> " + dataWriterVo.getCarbonDataDirectoryPath()); try { CarbonFile localCarbonFile = FileFactory.getCarbonFile(localFileName, FileFactory.getFileType(localFileName)); String carbonFilePath = dataWriterVo.getCarbonDataDirectoryPath() + localFileName .substring(localFileName.lastIndexOf(File.separator)); copyLocalFileToCarbonStore(carbonFilePath, localFileName, CarbonCommonConstants.BYTEBUFFER_SIZE, getMaxOfBlockAndFileSize(fileSizeInBytes, localCarbonFile.getSize())); } catch (IOException e) { throw new CarbonDataWriterException( "Problem while copying file from local store to carbon store"); } LOGGER.info( "Total copy time (ms) to copy file " + localFileName + " is " + (System.currentTimeMillis() - copyStartTime)); } Environment: Spark 1.6.2 standalone cluster + Carbondata 0.2.0 + Hadoop 2.7.2 would any of you can help me, thx~~ ------------------ Original ------------------ From: "";<[hidden email]>; Date: Mon, Jan 9, 2017 03:56 PM To: "dev"<[hidden email]>; Subject: load data error from csv file at hdfs error in standalone spark cluster Hi all, when I load data from hdfs csv file, a stage of spark job failed with the following error, where can I find a more detail error that can help me find the solution, or may some one know why this happen and how to solve it. command: cc.sql(s"load data inpath 'hdfs://master:9000/opt/sample.csv' into table test_table") error log: Job aborted due to stage failure: Task 0 in stage 7.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7.0 (TID 17, slave2): org.apache.carbondata.processing.etl.DataLoadingException: Due to internal errors, please check logs for more details. Job aborted due to stage failure: Task 0 in stage 7.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7.0 (TID 17, slave2): org.apache.carbondata.processing.etl.DataLoadingException: Due to internal errors, please check logs for more details. at org.apache.carbondata.processing.csvload.DataGraphExecuter.execute(DataGraphExecuter.java:212) at org.apache.carbondata.processing.csvload.DataGraphExecuter.executeGraph(DataGraphExecuter.java:144) at org.apache.carbondata.spark.load.CarbonLoaderUtil.executeGraph(CarbonLoaderUtil.java:212) at org.apache.carbondata.spark.rdd.SparkPartitionLoader.run(CarbonDataLoadRDD.scala:125) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD$$anon$1.<init>(CarbonDataLoadRDD.scala:255) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD.compute(CarbonDataLoadRDD.scala:232) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: |
Administrator
|
Hi
Please use spark-shell to create carboncontext, you can refer to these articles : https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=67635497 Regards Liang |
thx 陈亮总。
I've solved the problem, here is my record: first, I found the spark job failed when loading data and there is an error "CarbonDataWriterException: Problem while copying file from local store to carbon store", when located to the source code at ./processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter, it shows: private void copyCarbonDataFileToCarbonStorePath(String localFileName) throws CarbonDataWriterException { long copyStartTime = System.currentTimeMillis(); LOGGER.info("Copying " + localFileName + " --> " + dataWriterVo.getCarbonDataDirectoryPath()); try { CarbonFile localCarbonFile = FileFactory.getCarbonFile(localFileName, FileFactory.getFileType(localFileName)); String carbonFilePath = dataWriterVo.getCarbonDataDirectoryPath() + localFileName .substring(localFileName.lastIndexOf(File.separator)); copyLocalFileToCarbonStore(carbonFilePath, localFileName, CarbonCommonConstants.BYTEBUFFER_SIZE, getMaxOfBlockAndFileSize(fileSizeInBytes, localCarbonFile.getSize())); } catch (IOException e) { throw new CarbonDataWriterException( "Problem while copying file from local store to carbon store"); } LOGGER.info( "Total copy time (ms) to copy file " + localFileName + " is " + (System.currentTimeMillis() - copyStartTime)); } the main reason is that the method copyLocalFileToCarbonStore cause an IOException, but the catch block doesn't tell me what is the real reason that coused the error(at this moment, I really like technical logs more then business logs). so I add a line of code: ... catch (IOException e) { LOGGER.info("-------------------logs print by liyinwei start---------------------"); LOGGER.error(e, ""); LOGGER.info("-------------------logs print by liyinwei end ---------------------"); throw new CarbonDataWriterException( "Problem while copying file from local store to carbon store"); then I rebuild the source code and it logs as follows: INFO 10-01 10:29:59,546 - [test_table: Graph - MDKeyGentest_table][partitionID:0] -------------------logs print by liyinwei start--------------------- ERROR 10-01 10:29:59,547 - [test_table: Graph - MDKeyGentest_table][partitionID:0] java.io.FileNotFoundException: /home/hadoop/carbondata/bin/carbonshellstore/default/test_table/Fact/Part0/Segment_0/part-0-0-1484015398000.carbondata (No such file or directory) at java.io.FileOutputStream.open0(Native Method) ... INFO 10-01 10:29:59,547 - [test_table: Graph - MDKeyGentest_table][partitionID:0] -------------------logs print by liyinwei end --------------------- ERROR 10-01 10:29:59,547 - [test_table: Graph - MDKeyGentest_table][partitionID:0] Problem while copying file from local store to carbon store second, as u see, the main reason that cause the error is a FileNotFoundException, which means the metadata is not found. with the help of Liang Chen & Brave heart, I found that the default of carbondata storePath is as below if we start the spark-shell by using carbon-spark-shell: scala> print(cc.storePath) /home/hadoop/carbondata/bin/carbonshellstore so I added a parameter when starting carbon-spark-shell: ./bin/carbon-spark-shell --conf spark.carbon.storepath=hdfs://master:9000/home/hadoop/carbondata/bin/carbonshellstore and then print the storePath: scala> print(cc.storePath) hdfs://master:9000/home/hadoop/carbondata/bin/carbonshellstore finally, I run the command cc.sql(s"load data inpath 'hdfs://master:9000/home/hadoop/sample.csv' into table test_table") again and it success, which follows: cc.sql("select * from test_table").show ------------------ Original ------------------ From: "Liang Chen";<[hidden email]>; Date: Tue, Jan 10, 2017 12:11 PM To: "dev"<[hidden email]>; Subject: Re: Problem while copying file from local store to carbon store Hi Please use spark-shell to create carboncontext, you can refer to these articles : https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=67635497 Regards Liang -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/load-data-error-from-csv-file-at-hdfs-error-in-standalone-spark-cluster-tp5783p5844.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com. |
Administrator
|
Hi liyinwei
Very good! You are the person who i met learnt Apache CarbonData fastest! Can you raise one mailing list discussion about improving log info what you mentioned. Look forward to seeing your code contribution :) Regards Liang 2017-01-10 14:44 GMT+08:00 251469031 <[hidden email]>: > thx 陈亮总。 > > > I've solved the problem, here is my record: > > > first, > > > I found the spark job failed when loading data and there is an error > "CarbonDataWriterException: Problem while copying file from local store to > carbon store", when located to the source code at > ./processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter, > it shows: > > > private void copyCarbonDataFileToCarbonStorePath(String localFileName) > throws CarbonDataWriterException { > long copyStartTime = System.currentTimeMillis(); > LOGGER.info("Copying " + localFileName + " --> " + dataWriterVo. > getCarbonDataDirectoryPath()); > try { > CarbonFile localCarbonFile = > FileFactory.getCarbonFile(localFileName, > FileFactory.getFileType(localFileName)); > String carbonFilePath = dataWriterVo.getCarbonDataDirectoryPath() + > localFileName > .substring(localFileName.lastIndexOf(File.separator)); > copyLocalFileToCarbonStore(carbonFilePath, localFileName, > CarbonCommonConstants.BYTEBUFFER_SIZE, > getMaxOfBlockAndFileSize(fileSizeInBytes, > localCarbonFile.getSize())); > } catch (IOException e) { > throw new CarbonDataWriterException( > "Problem while copying file from local store to carbon store"); > } > LOGGER.info( > "Total copy time (ms) to copy file " + localFileName + " is " + > (System.currentTimeMillis() > - copyStartTime)); > } > > > > the main reason is that the method copyLocalFileToCarbonStore cause an > IOException, but the catch block doesn't tell me what is the real reason > that coused the error(at this moment, I really like technical logs more > then business logs). so I add a line of code: > ... > catch (IOException e) { > LOGGER.info("-------------------logs print by liyinwei > start---------------------"); > LOGGER.error(e, ""); > LOGGER.info("-------------------logs print by liyinwei end > ---------------------"); > throw new CarbonDataWriterException( > "Problem while copying file from local store to carbon store"); > > > > then I rebuild the source code and it logs as follows: > > > INFO 10-01 10:29:59,546 - [test_table: Graph - MDKeyGentest_table][partitionID:0] > -------------------logs print by liyinwei start--------------------- > ERROR 10-01 10:29:59,547 - [test_table: Graph - MDKeyGentest_table][ > partitionID:0] > java.io.FileNotFoundException: /home/hadoop/carbondata/bin/ > carbonshellstore/default/test_table/Fact/Part0/Segment_0/ > part-0-0-1484015398000.carbondata (No such file or directory) > at java.io.FileOutputStream.open0(Native Method) > ... > INFO 10-01 10:29:59,547 - [test_table: Graph - MDKeyGentest_table][partitionID:0] > -------------------logs print by liyinwei end --------------------- > ERROR 10-01 10:29:59,547 - [test_table: Graph - MDKeyGentest_table][partitionID:0] > Problem while copying file from local store to carbon store > > > > second, > > > as u see, the main reason that cause the error is a FileNotFoundException, > which means the metadata is not found. with the help of Liang Chen & Brave > heart, I found that the default of carbondata storePath is as below if we > start the spark-shell by using carbon-spark-shell: > scala> print(cc.storePath) > /home/hadoop/carbondata/bin/carbonshellstore > > > > so I added a parameter when starting carbon-spark-shell: > ./bin/carbon-spark-shell --conf spark.carbon.storepath=hdfs:// > master:9000/home/hadoop/carbondata/bin/carbonshellstore > > > and then print the storePath: > scala> print(cc.storePath) > hdfs://master:9000/home/hadoop/carbondata/bin/carbonshellstore > > > > > > finally, > > > I run the command > > > cc.sql(s"load data inpath 'hdfs://master:9000/home/hadoop/sample.csv' > into table test_table") > > > again and it success, which follows: > > > cc.sql("select * from test_table").show > > > > > > > ------------------ Original ------------------ > From: "Liang Chen";<[hidden email]>; > Date: Tue, Jan 10, 2017 12:11 PM > To: "dev"<[hidden email]>; > > Subject: Re: Problem while copying file from local store to carbon store > > > > Hi > > Please use spark-shell to create carboncontext, you can refer to these > articles : > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=67635497 > > Regards > Liang > > > > -- > View this message in context: http://apache-carbondata- > mailing-list-archive.1130556.n5.nabble.com/load-data-error- > from-csv-file-at-hdfs-error-in-standalone-spark-cluster-tp5783p5844.html > Sent from the Apache CarbonData Mailing List archive mailing list archive > at Nabble.com. -- Regards Liang |
Free forum by Nabble | Edit this page |