load data error from csv file at hdfs error in standalone spark cluster

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

load data error from csv file at hdfs error in standalone spark cluster

李寅威
Hi all,


when I load data from hdfs csv file, a stage of spark job failed with the following error, where can I find a more detail error that can help me find the solution, or may some one know why this happen and how to solve it.


command:


cc.sql(s"load data inpath 'hdfs://master:9000/opt/sample.csv' into table test_table")


error log:


Job aborted due to stage failure: Task 0 in stage 7.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7.0 (TID 17, slave2): org.apache.carbondata.processing.etl.DataLoadingException: Due to internal errors, please check logs for more details.


Job aborted due to stage failure: Task 0 in stage 7.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7.0 (TID 17, slave2): org.apache.carbondata.processing.etl.DataLoadingException: Due to internal errors, please check logs for more details. at org.apache.carbondata.processing.csvload.DataGraphExecuter.execute(DataGraphExecuter.java:212) at org.apache.carbondata.processing.csvload.DataGraphExecuter.executeGraph(DataGraphExecuter.java:144) at org.apache.carbondata.spark.load.CarbonLoaderUtil.executeGraph(CarbonLoaderUtil.java:212) at org.apache.carbondata.spark.rdd.SparkPartitionLoader.run(CarbonDataLoadRDD.scala:125) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD$$anon$1.<init>(CarbonDataLoadRDD.scala:255) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD.compute(CarbonDataLoadRDD.scala:232) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace:
Reply | Threaded
Open this post in threaded view
|

Problem while copying file from local store to carbon store

李寅威
Hi all:

when I load data from hdfs to a table:

cc.sql(s"load data inpath 'hdfs://master:9000/home/hadoop/sample.csv' into table test_table")

two errors occured,  at slave1:


INFO  09-01 16:17:58,611 - test_table: Graph - CSV Input *****************Started all csv reading*********** INFO  09-01 16:17:58,611 - [pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] *****************started csv reading by thread*********** INFO  09-01 16:17:58,635 - [pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] Total Number of records processed by this thread is: 3 INFO  09-01 16:17:58,635 - [pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] Time taken to processed 3 Number of records: 24 INFO  09-01 16:17:58,636 - [pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] *****************Completed csv reading by thread*********** INFO  09-01 16:17:58,636 - test_table: Graph - CSV Input *****************Completed all csv reading*********** INFO  09-01 16:17:58,642 - [test_table: Graph - Carbon Surrogate Key Generator][partitionID:0] Column cache size not configured. Therefore default behavior will be considered and no LRU based eviction of columns will be done ERROR 09-01 16:17:58,645 - [test_table: Graph - Carbon Surrogate Key Generator][partitionID:0] org.apache.carbondata.core.util.CarbonUtilException: Either dictionary or its metadata does not exist for column identifier :: ColumnIdentifier [columnId=c70480f9-4336-4186-8bd0-a3bebb50ea6a] ERROR 09-01 16:17:58,646 - [test_table: Graph - Carbon Surrogate Key Generator][partitionID:0]  org.pentaho.di.core.exception.KettleException:  org.apache.carbondata.core.util.CarbonUtilException: Either dictionary or its metadata does not exist for column identifier :: ColumnIdentifier [columnId=c70480f9-4336-4186-8bd0-a3bebb50ea6a] at org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.FileStoreSurrogateKeyGenForCSV.initDictionaryCacheInfo(FileStoreSurrogateKeyGenForCSV.java:297) at org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.FileStoreSurrogateKeyGenForCSV.populateCache(FileStoreSurrogateKeyGenForCSV.java:270) at org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.FileStoreSurrogateKeyGenForCSV.<init>(FileStoreSurrogateKeyGenForCSV.java:144) at org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedSeqGenStep.processRow(CarbonCSVBasedSeqGenStep.java:385) at org.pentaho.di.trans.step.RunThread.run(RunThread.java:50) at java.lang.Thread.run(Thread.java:745) INFO  09-01 16:17:58,647 - [test_table: Graph - Carbon Slice Mergertest_table][partitionID:table] Record Procerssed For table: test_table INFO  09-01 16:17:58,647 - [test_table: Graph - Carbon Slice Mergertest_table][partitionID:table] Summary: Carbon Slice Merger Step: Read: 0: Write: 0 INFO  09-01 16:17:58,647 - [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] Record Processed For table: test_table INFO  09-01 16:17:58,647 - [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] Number of Records was Zero INFO  09-01 16:17:58,647 - [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] Summary: Carbon Sort Key Step: Read: 0: Write: 0 INFO  09-01 16:17:58,747 - [Executor task launch worker-0][partitionID:default_test_table_632e80a6-77ef-44b2-aed7-2e5bbf56610e] Graph execution is finished. ERROR 09-01 16:17:58,748 - [Executor task launch worker-0][partitionID:default_test_table_632e80a6-77ef-44b2-aed7-2e5bbf56610e] Graph Execution had errors INFO  09-01 16:17:58,749 - [Executor task launch worker-0][partitionID:default_test_table_632e80a6-77ef-44b2-aed7-2e5bbf56610e] Deleted the local store location/tmp/259202084415620/0 INFO  09-01 16:17:58,749 - DataLoad complete INFO  09-01 16:17:58,749 - Data Loaded successfully with LoadCount:0 INFO  09-01 16:17:58,749 - DataLoad failure ERROR 09-01 16:17:58,749 - [Executor task launch worker-0][partitionID:default_test_table_632e80a6-77ef-44b2-aed7-2e5bbf56610e]  org.apache.carbondata.processing.etl.DataLoadingException: Due to internal errors, please check logs for more details. at org.apache.carbondata.processing.csvload.DataGraphExecuter.execute(DataGraphExecuter.java:212) at org.apache.carbondata.processing.csvload.DataGraphExecuter.executeGraph(DataGraphExecuter.java:144) at org.apache.carbondata.spark.load.CarbonLoaderUtil.executeGraph(CarbonLoaderUtil.java:212) at org.apache.carbondata.spark.rdd.SparkPartitionLoader.run(CarbonDataLoadRDD.scala:125) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD$$anon$1.<init>(CarbonDataLoadRDD.scala:255) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD.compute(CarbonDataLoadRDD.scala:232) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ERROR 09-01 16:17:58,752 - Exception in task 0.3 in stage 3.0 (TID 8) org.apache.carbondata.processing.etl.DataLoadingException: Due to internal errors, please check logs for more details. at org.apache.carbondata.processing.csvload.DataGraphExecuter.execute(DataGraphExecuter.java:212) at org.apache.carbondata.processing.csvload.DataGraphExecuter.executeGraph(DataGraphExecuter.java:144) at org.apache.carbondata.spark.load.CarbonLoaderUtil.executeGraph(CarbonLoaderUtil.java:212) at org.apache.carbondata.spark.rdd.SparkPartitionLoader.run(CarbonDataLoadRDD.scala:125) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD$$anon$1.<init>(CarbonDataLoadRDD.scala:255) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD.compute(CarbonDataLoadRDD.scala:232) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)




at slave2:


INFO  09-01 16:17:55,182 - [test_table: Graph - MDKeyGentest_table][partitionID:0] Copying /tmp/259188927254235/0/default/test_table/Fact/Part0/Segment_0/0/part-0-0-1483949874000.carbondata --> /home/hadoop/carbondata/bin/carbonshellstore/default/test_table/Fact/Part0/Segment_0 INFO  09-01 16:17:55,182 - [test_table: Graph - MDKeyGentest_table][partitionID:0] The configured block size is 1024 MB, the actual carbon file size is 921 Byte, choose the max value 1024 MB as the block size on HDFS ERROR 09-01 16:17:55,183 - [test_table: Graph - MDKeyGentest_table][partitionID:0] Problem while copying file from local store to carbon store org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException: Problem while copying file from local store to carbon store at org.apache.carbondata.processing.store.writer.AbstractFactDataWriter.copyCarbonDataFileToCarbonStorePath(AbstractFactDataWriter.java:604) at org.apache.carbondata.processing.store.writer.AbstractFactDataWriter.closeWriter(AbstractFactDataWriter.java:510) at org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.closeHandler(CarbonFactDataHandlerColumnar.java:879) at org.apache.carbondata.processing.mdkeygen.MDKeyGenStep.processingComplete(MDKeyGenStep.java:245) at org.apache.carbondata.processing.mdkeygen.MDKeyGenStep.processRow(MDKeyGenStep.java:234) at org.pentaho.di.trans.step.RunThread.run(RunThread.java:50) at java.lang.Thread.run(Thread.java:745) INFO  09-01 16:17:55,184 - [test_table: Graph - Carbon Slice Mergertest_table][partitionID:table] Record Procerssed For table: test_table INFO  09-01 16:17:55,184 - [test_table: Graph - Carbon Slice Mergertest_table][partitionID:table] Summary: Carbon Slice Merger Step: Read: 1: Write: 0 INFO  09-01 16:17:55,284 - [Executor task launch worker-0][partitionID:default_test_table_c3017cd2-8920-488d-a715-c0d02250148e] Graph execution is finished. ERROR 09-01 16:17:55,284 - [Executor task launch worker-0][partitionID:default_test_table_c3017cd2-8920-488d-a715-c0d02250148e] Graph Execution had errors INFO  09-01 16:17:55,285 - [Executor task launch worker-0][partitionID:default_test_table_c3017cd2-8920-488d-a715-c0d02250148e] Deleted the local store location/tmp/259188927254235/0 INFO  09-01 16:17:55,285 - DataLoad complete INFO  09-01 16:17:55,286 - Data Loaded successfully with LoadCount:0 INFO  09-01 16:17:55,286 - DataLoad failure ERROR 09-01 16:17:55,286 - [Executor task launch worker-0][partitionID:default_test_table_c3017cd2-8920-488d-a715-c0d02250148e]  org.apache.carbondata.processing.etl.DataLoadingException: Due to internal errors, please check logs for more details. at org.apache.carbondata.processing.csvload.DataGraphExecuter.execute(DataGraphExecuter.java:212) at org.apache.carbondata.processing.csvload.DataGraphExecuter.executeGraph(DataGraphExecuter.java:144) at org.apache.carbondata.spark.load.CarbonLoaderUtil.executeGraph(CarbonLoaderUtil.java:212) at org.apache.carbondata.spark.rdd.SparkPartitionLoader.run(CarbonDataLoadRDD.scala:125) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD$$anon$1.<init>(CarbonDataLoadRDD.scala:255) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD.compute(CarbonDataLoadRDD.scala:232) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ERROR 09-01 16:17:55,288 - Exception in task 0.0 in stage 3.0 (TID 5) org.apache.carbondata.processing.etl.DataLoadingException: Due to internal errors, please check logs for more details. at org.apache.carbondata.processing.csvload.DataGraphExecuter.execute(DataGraphExecuter.java:212) at org.apache.carbondata.processing.csvload.DataGraphExecuter.executeGraph(DataGraphExecuter.java:144) at org.apache.carbondata.spark.load.CarbonLoaderUtil.executeGraph(CarbonLoaderUtil.java:212) at org.apache.carbondata.spark.rdd.SparkPartitionLoader.run(CarbonDataLoadRDD.scala:125) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD$$anon$1.<init>(CarbonDataLoadRDD.scala:255) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD.compute(CarbonDataLoadRDD.scala:232) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) INFO  09-01 16:17:55,926 - Got assigned task 7 INFO  09-01 16:17:55,926 - Running task 0.2 in stage 3.0 (TID 7) INFO  09-01 16:17:55,930 - Input split: slave2 INFO  09-01 16:17:55,930 - The Block Count in this node :1 INFO  09-01 16:17:55,931 - [Executor task launch worker-0][partitionID:default_test_table_fa5212b0-3e3c-43e1-ae5e-27396dce020c] ************* Is Columnar Storagetrue INFO  09-01 16:17:56,011 - [Executor task launch worker-0][partitionID:default_test_table_fa5212b0-3e3c-43e1-ae5e-27396dce020c] Kettle environment initialized INFO  09-01 16:17:56,027 - [Executor task launch worker-0][partitionID:default_test_table_fa5212b0-3e3c-43e1-ae5e-27396dce020c] ** Using csv file ** INFO  09-01 16:17:56,035 - [Executor task launch worker-0][partitionID:default_test_table_fa5212b0-3e3c-43e1-ae5e-27396dce020c] Graph execution is started /tmp/259190107897964/0/etl/default/test_table/0/0/test_table.ktr INFO  09-01 16:17:56,035 - test_table: Graph - CSV Input *****************Started all csv reading*********** INFO  09-01 16:17:56,035 - [pool-31-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-31-thread-1] *****************started csv reading by thread*********** INFO  09-01 16:17:56,040 - [pool-31-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-31-thread-1] Total Number of records processed by this thread is: 3 INFO  09-01 16:17:56,041 - [pool-31-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-31-thread-1] Time taken to processed 3 Number of records: 6 INFO  09-01 16:17:56,041 - [pool-31-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-31-thread-1] *****************Completed csv reading by thread*********** INFO  09-01 16:17:56,041 - test_table: Graph - CSV Input *****************Completed all csv reading*********** INFO  09-01 16:17:56,043 - [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] Sort size for table: 500000 INFO  09-01 16:17:56,043 - [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] Number of intermediate file to be merged: 20 INFO  09-01 16:17:56,043 - [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] File Buffer Size: 1048576 INFO  09-01 16:17:56,043 - [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] temp file location/tmp/259190107897964/0/default/test_table/Fact/Part0/Segment_0/0/sortrowtmp INFO  09-01 16:17:56,046 - [test_table: Graph - Carbon Surrogate Key Generator][partitionID:0] Level cardinality file written to : /tmp/259190107897964/0/default/test_table/Fact/Part0/Segment_0/0/levelmetadata_test_table.metadata INFO  09-01 16:17:56,046 - [test_table: Graph - Carbon Surrogate Key Generator][partitionID:0] Record Procerssed For table: test_table INFO  09-01 16:17:56,047 - [test_table: Graph - Carbon Surrogate Key Generator][partitionID:0] Summary: Carbon CSV Based Seq Gen Step : 3: Write: 3 INFO  09-01 16:17:56,049 - [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] File based sorting will be used INFO  09-01 16:17:56,049 - [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] Record Processed For table: test_table


it seems as IOException, the source code is as follows:


  /**
   * This method will copy the given file to carbon store location
   *
   * @param localFileName local file name with full path
   * @throws CarbonDataWriterException
   */
  private void copyCarbonDataFileToCarbonStorePath(String localFileName)
      throws CarbonDataWriterException {
    long copyStartTime = System.currentTimeMillis();
    LOGGER.info("Copying " + localFileName + " --> " + dataWriterVo.getCarbonDataDirectoryPath());
    try {
      CarbonFile localCarbonFile =
          FileFactory.getCarbonFile(localFileName, FileFactory.getFileType(localFileName));
      String carbonFilePath = dataWriterVo.getCarbonDataDirectoryPath() + localFileName
          .substring(localFileName.lastIndexOf(File.separator));
      copyLocalFileToCarbonStore(carbonFilePath, localFileName,
          CarbonCommonConstants.BYTEBUFFER_SIZE,
          getMaxOfBlockAndFileSize(fileSizeInBytes, localCarbonFile.getSize()));
    } catch (IOException e) {
      throw new CarbonDataWriterException(
          "Problem while copying file from local store to carbon store");
    }
    LOGGER.info(
        "Total copy time (ms) to copy file " + localFileName + " is " + (System.currentTimeMillis()
            - copyStartTime));
  }



Environment:
Spark 1.6.2 standalone cluster + Carbondata 0.2.0 + Hadoop 2.7.2




would any of you can help me, thx~~


------------------ Original ------------------
From:  "";<[hidden email]>;
Date:  Mon, Jan 9, 2017 03:56 PM
To:  "dev"<[hidden email]>;

Subject:  load data error from csv file at hdfs error in standalone spark cluster



Hi all,


when I load data from hdfs csv file, a stage of spark job failed with the following error, where can I find a more detail error that can help me find the solution, or may some one know why this happen and how to solve it.


command:


cc.sql(s"load data inpath 'hdfs://master:9000/opt/sample.csv' into table test_table")


error log:


Job aborted due to stage failure: Task 0 in stage 7.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7.0 (TID 17, slave2): org.apache.carbondata.processing.etl.DataLoadingException: Due to internal errors, please check logs for more details.


Job aborted due to stage failure: Task 0 in stage 7.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7.0 (TID 17, slave2): org.apache.carbondata.processing.etl.DataLoadingException: Due to internal errors, please check logs for more details. at org.apache.carbondata.processing.csvload.DataGraphExecuter.execute(DataGraphExecuter.java:212) at org.apache.carbondata.processing.csvload.DataGraphExecuter.executeGraph(DataGraphExecuter.java:144) at org.apache.carbondata.spark.load.CarbonLoaderUtil.executeGraph(CarbonLoaderUtil.java:212) at org.apache.carbondata.spark.rdd.SparkPartitionLoader.run(CarbonDataLoadRDD.scala:125) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD$$anon$1.<init>(CarbonDataLoadRDD.scala:255) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD.compute(CarbonDataLoadRDD.scala:232) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace:
Reply | Threaded
Open this post in threaded view
|

Re: Problem while copying file from local store to carbon store

Liang Chen
Administrator
Hi

Please use spark-shell to create carboncontext, you can refer to these articles :
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=67635497

Regards
Liang
Reply | Threaded
Open this post in threaded view
|

Re: Problem while copying file from local store to carbon store

李寅威
thx 陈亮总。


I've solved the problem, here is my record:


first,


I found the spark job failed when loading data and there is an error "CarbonDataWriterException: Problem while copying file from local store to carbon store", when located to the source code at ./processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter, it shows:


private void copyCarbonDataFileToCarbonStorePath(String localFileName)
      throws CarbonDataWriterException {
    long copyStartTime = System.currentTimeMillis();
    LOGGER.info("Copying " + localFileName + " --> " + dataWriterVo.getCarbonDataDirectoryPath());
    try {
      CarbonFile localCarbonFile =
          FileFactory.getCarbonFile(localFileName, FileFactory.getFileType(localFileName));
      String carbonFilePath = dataWriterVo.getCarbonDataDirectoryPath() + localFileName
          .substring(localFileName.lastIndexOf(File.separator));
      copyLocalFileToCarbonStore(carbonFilePath, localFileName,
          CarbonCommonConstants.BYTEBUFFER_SIZE,
          getMaxOfBlockAndFileSize(fileSizeInBytes, localCarbonFile.getSize()));
    } catch (IOException e) {
      throw new CarbonDataWriterException(
          "Problem while copying file from local store to carbon store");
    }
    LOGGER.info(
        "Total copy time (ms) to copy file " + localFileName + " is " + (System.currentTimeMillis()
            - copyStartTime));
  }



the main reason is that the method copyLocalFileToCarbonStore cause an IOException, but the catch block doesn't tell me what is the real reason that coused the error(at this moment, I really like technical logs more then business logs). so I add a line of code:
...
catch (IOException e) {
      LOGGER.info("-------------------logs print by liyinwei start---------------------");
      LOGGER.error(e, "");
      LOGGER.info("-------------------logs print by liyinwei end  ---------------------");
      throw new CarbonDataWriterException(
          "Problem while copying file from local store to carbon store");



then I rebuild the source code and it logs as follows:


INFO  10-01 10:29:59,546 - [test_table: Graph - MDKeyGentest_table][partitionID:0] -------------------logs print by liyinwei start---------------------
ERROR 10-01 10:29:59,547 - [test_table: Graph - MDKeyGentest_table][partitionID:0]
java.io.FileNotFoundException: /home/hadoop/carbondata/bin/carbonshellstore/default/test_table/Fact/Part0/Segment_0/part-0-0-1484015398000.carbondata (No such file or directory)
        at java.io.FileOutputStream.open0(Native Method)
        ...
INFO  10-01 10:29:59,547 - [test_table: Graph - MDKeyGentest_table][partitionID:0] -------------------logs print by liyinwei end  ---------------------
ERROR 10-01 10:29:59,547 - [test_table: Graph - MDKeyGentest_table][partitionID:0] Problem while copying file from local store to carbon store



second,


as u see, the main reason that cause the error is a FileNotFoundException, which means the metadata is not found. with the help of Liang Chen & Brave heart, I found that the default of carbondata storePath is as below if we start the spark-shell by using carbon-spark-shell:
scala> print(cc.storePath)
/home/hadoop/carbondata/bin/carbonshellstore



so I added a parameter when starting carbon-spark-shell:
./bin/carbon-spark-shell --conf spark.carbon.storepath=hdfs://master:9000/home/hadoop/carbondata/bin/carbonshellstore


and then print the storePath:
scala> print(cc.storePath)
hdfs://master:9000/home/hadoop/carbondata/bin/carbonshellstore





finally,


I run the command


cc.sql(s"load data inpath 'hdfs://master:9000/home/hadoop/sample.csv' into table test_table")


again and it success, which follows:


cc.sql("select * from test_table").show






------------------ Original ------------------
From:  "Liang Chen";<[hidden email]>;
Date:  Tue, Jan 10, 2017 12:11 PM
To:  "dev"<[hidden email]>;

Subject:  Re: Problem while copying file from local store to carbon store



Hi

Please use spark-shell to create carboncontext, you can refer to these
articles :
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=67635497

Regards
Liang



--
View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/load-data-error-from-csv-file-at-hdfs-error-in-standalone-spark-cluster-tp5783p5844.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Problem while copying file from local store to carbon store

Liang Chen
Administrator
Hi liyinwei

Very good! You are the person who i met learnt Apache CarbonData fastest!
Can you raise one mailing list discussion about improving log info what you
mentioned.
Look forward to seeing your code contribution :)

Regards
Liang

2017-01-10 14:44 GMT+08:00 251469031 <[hidden email]>:

> thx 陈亮总。
>
>
> I've solved the problem, here is my record:
>
>
> first,
>
>
> I found the spark job failed when loading data and there is an error
> "CarbonDataWriterException: Problem while copying file from local store to
> carbon store", when located to the source code at
> ./processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter,
> it shows:
>
>
> private void copyCarbonDataFileToCarbonStorePath(String localFileName)
>       throws CarbonDataWriterException {
>     long copyStartTime = System.currentTimeMillis();
>     LOGGER.info("Copying " + localFileName + " --> " + dataWriterVo.
> getCarbonDataDirectoryPath());
>     try {
>       CarbonFile localCarbonFile =
>           FileFactory.getCarbonFile(localFileName,
> FileFactory.getFileType(localFileName));
>       String carbonFilePath = dataWriterVo.getCarbonDataDirectoryPath() +
> localFileName
>           .substring(localFileName.lastIndexOf(File.separator));
>       copyLocalFileToCarbonStore(carbonFilePath, localFileName,
>           CarbonCommonConstants.BYTEBUFFER_SIZE,
>           getMaxOfBlockAndFileSize(fileSizeInBytes,
> localCarbonFile.getSize()));
>     } catch (IOException e) {
>       throw new CarbonDataWriterException(
>           "Problem while copying file from local store to carbon store");
>     }
>     LOGGER.info(
>         "Total copy time (ms) to copy file " + localFileName + " is " +
> (System.currentTimeMillis()
>             - copyStartTime));
>   }
>
>
>
> the main reason is that the method copyLocalFileToCarbonStore cause an
> IOException, but the catch block doesn't tell me what is the real reason
> that coused the error(at this moment, I really like technical logs more
> then business logs). so I add a line of code:
> ...
> catch (IOException e) {
>       LOGGER.info("-------------------logs print by liyinwei
> start---------------------");
>       LOGGER.error(e, "");
>       LOGGER.info("-------------------logs print by liyinwei end
> ---------------------");
>       throw new CarbonDataWriterException(
>           "Problem while copying file from local store to carbon store");
>
>
>
> then I rebuild the source code and it logs as follows:
>
>
> INFO  10-01 10:29:59,546 - [test_table: Graph - MDKeyGentest_table][partitionID:0]
> -------------------logs print by liyinwei start---------------------
> ERROR 10-01 10:29:59,547 - [test_table: Graph - MDKeyGentest_table][
> partitionID:0]
> java.io.FileNotFoundException: /home/hadoop/carbondata/bin/
> carbonshellstore/default/test_table/Fact/Part0/Segment_0/
> part-0-0-1484015398000.carbondata (No such file or directory)
>         at java.io.FileOutputStream.open0(Native Method)
>         ...
> INFO  10-01 10:29:59,547 - [test_table: Graph - MDKeyGentest_table][partitionID:0]
> -------------------logs print by liyinwei end  ---------------------
> ERROR 10-01 10:29:59,547 - [test_table: Graph - MDKeyGentest_table][partitionID:0]
> Problem while copying file from local store to carbon store
>
>
>
> second,
>
>
> as u see, the main reason that cause the error is a FileNotFoundException,
> which means the metadata is not found. with the help of Liang Chen & Brave
> heart, I found that the default of carbondata storePath is as below if we
> start the spark-shell by using carbon-spark-shell:
> scala> print(cc.storePath)
> /home/hadoop/carbondata/bin/carbonshellstore
>
>
>
> so I added a parameter when starting carbon-spark-shell:
> ./bin/carbon-spark-shell --conf spark.carbon.storepath=hdfs://
> master:9000/home/hadoop/carbondata/bin/carbonshellstore
>
>
> and then print the storePath:
> scala> print(cc.storePath)
> hdfs://master:9000/home/hadoop/carbondata/bin/carbonshellstore
>
>
>
>
>
> finally,
>
>
> I run the command
>
>
> cc.sql(s"load data inpath 'hdfs://master:9000/home/hadoop/sample.csv'
> into table test_table")
>
>
> again and it success, which follows:
>
>
> cc.sql("select * from test_table").show
>
>
>
>
>
>
> ------------------ Original ------------------
> From:  "Liang Chen";<[hidden email]>;
> Date:  Tue, Jan 10, 2017 12:11 PM
> To:  "dev"<[hidden email]>;
>
> Subject:  Re: Problem while copying file from local store to carbon store
>
>
>
> Hi
>
> Please use spark-shell to create carboncontext, you can refer to these
> articles :
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=67635497
>
> Regards
> Liang
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/load-data-error-
> from-csv-file-at-hdfs-error-in-standalone-spark-cluster-tp5783p5844.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.




--
Regards
Liang