Hi all,
when I load csv file to a table, it accured an error in spark jobs: Version & Environment: Spark1.6.0 + Lastest version of Carbondata at github + cluster mode commands: cc.sql("create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata'") cc.sql(s"load data inpath 'hdfs://master:9000/carbondata/sample.csv' into table test_table") CSV file data: cat > sample.csv << EOF id,name,city,age 1,david,shenzhen,31 2,eason,shenzhen,27 3,jarry,wuhan,35 EOF Error Description: collect at CarbonDataRDDFactory.scala:623 Failure Reason: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 8, slave3): org.apache.carbondata.processing.etl.DataLoadingException: Due to internal errors, please check logs for more details. Spark Worker Log: 16/12/28 14:18:40 INFO csvreaderstep.CsvInput: test_table: Graph - CSV Input *****************Started all csv reading*********** 16/12/28 14:18:40 INFO csvreaderstep.CsvInput: [pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] *****************started csv reading by thread*********** 16/12/28 14:18:40 INFO csvreaderstep.CsvInput: [pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] Total Number of records processed by this thread is: 3 16/12/28 14:18:40 INFO csvreaderstep.CsvInput: [pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] Time taken to processed 3 Number of records: 15 16/12/28 14:18:40 INFO csvreaderstep.CsvInput: [pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] *****************Completed csv reading by thread*********** 16/12/28 14:18:40 INFO csvreaderstep.CsvInput: test_table: Graph - CSV Input *****************Completed all csv reading*********** 16/12/28 14:18:40 INFO cache.CarbonLRUCache: [test_table: Graph - Carbon Surrogate Key Generator][partitionID:0] Column cache size not configured. Therefore default behavior will be considered and no LRU based eviction of columns will be done 16/12/28 14:18:40 ERROR csvbased.CarbonCSVBasedSeqGenStep: [test_table: Graph - Carbon Surrogate Key Generator][partitionID:0] java.lang.RuntimeException: java.lang.NullPointerException at org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedSeqGenStep.process(CarbonCSVBasedSeqGenStep.java:940) at org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedSeqGenStep.processRow(CarbonCSVBasedSeqGenStep.java:515) at org.pentaho.di.trans.step.RunThread.run(RunThread.java:50) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.carbondata.core.cache.dictionary.ColumnReverseDictionaryInfo.getSurrogateKey(ColumnReverseDictionaryInfo.java:73) at org.apache.carbondata.core.cache.dictionary.AbstractColumnDictionaryInfo.getSurrogateKey(AbstractColumnDictionaryInfo.java:289) at org.apache.carbondata.core.cache.dictionary.ReverseDictionary.getSurrogateKey(ReverseDictionary.java:50) at org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedDimSurrogateKeyGen.generateSurrogateKeys(CarbonCSVBasedDimSurrogateKeyGen.java:150) at org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedSeqGenStep.populateOutputRow(CarbonCSVBasedSeqGenStep.java:1233) at org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedSeqGenStep.process(CarbonCSVBasedSeqGenStep.java:929) ... 3 more 16/12/28 14:18:40 INFO sortdatastep.SortKeyStep: [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] Record Processed For table: test_table 16/12/28 14:18:40 INFO step.CarbonSliceMergerStep: [test_table: Graph - Carbon Slice Mergertest_table][partitionID:table] Record Procerssed For table: test_table is anyone has any idea? thx~ |
Free forum by Nabble | Edit this page |