Payal created CARBONDATA-602:
-------------------------------- Summary: When we are loading data 3 or 4 time , It is throwing an error Key: CARBONDATA-602 URL: https://issues.apache.org/jira/browse/CARBONDATA-602 Project: CarbonData Issue Type: Bug Components: data-load Reporter: Payal When we are Loading data using 'USE_KETTLE' ='false' with 'SINGLE_PASS'='true' ,It is Throwing an error -- Error: java.lang.Exception: Data load failed due to error while write dictionary file! (state=,code=0) and without 'USE_KETTLE' ='false' Data load is successful For Example: CREATE TABLE uniqdata_INCLUDEDICTIONARY (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('DICTIONARY_INCLUDE'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); LOAD DATA INPATH 'hdfs://hadoop-master:54311/data/uniqdata/7000_UniqData.csv' into table uniqdata_INCLUDEDICTIONARY OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_LOGGER_ENABLE'='TRUE', 'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','SINGLE_PASS'='true','USE_KETTLE' ='false'); Error: java.lang.Exception: Dataload failed due to error while write dictionary file! (state=,code=0) LOAD DATA INPATH 'hdfs://hadoop-master:54311/data/uniqdata/7000_UniqData.csv' into table uniqdata_INCLUDEDICTIONARY OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_LOGGER_ENABLE'='TRUE', 'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','SINGLE_PASS'='true'); +---------+--+ | Result | +---------+--+ +---------+--+ INFO 06-01 13:31:54,820 - Running query 'LOAD DATA INPATH 'hdfs://hadoop-master:54311/data/uniqdata/7000_UniqData.csv' into table uniqdata_INCLUDEDICTIONARY OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_LOGGER_ENABLE'='TRUE', 'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','SINGLE_PASS'='true','USE_KETTLE' ='false')' with 2e6007f7-946d-4071-a73f-30d90538ebd6 INFO 06-01 13:31:54,820 - pool-26-thread-58 Query [LOAD DATA INPATH 'HDFS://HADOOP-MASTER:54311/DATA/UNIQDATA/7000_UNIQDATA.CSV' INTO TABLE UNIQDATA_INCLUDEDICTIONARY OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_LOGGER_ENABLE'='TRUE', 'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,DOUBLE_COLUMN1,DOUBLE_COLUMN2,INTEGER_COLUMN1','SINGLE_PASS'='TRUE','USE_KETTLE' ='FALSE')] INFO 06-01 13:31:54,831 - Successfully able to get the table metadata file lock INFO 06-01 13:31:54,834 - pool-26-thread-58 Initiating Direct Load for the Table : (meradb.uniqdata_includedictionary) AUDIT 06-01 13:31:54,838 - [deepak-Vostro-3546][hduser][Thread-494]Data load request has been received for table meradb.uniqdata_includedictionary AUDIT 06-01 13:31:54,838 - [deepak-Vostro-3546][hduser][Thread-494]Data is loading with New Data Flow for table meradb.uniqdata_includedictionary INFO 06-01 13:31:54,891 - pool-26-thread-58 [Block Distribution] INFO 06-01 13:31:54,891 - pool-26-thread-58 totalInputSpaceConsumed: 1505367 , defaultParallelism: 8 INFO 06-01 13:31:54,891 - pool-26-thread-58 mapreduce.input.fileinputformat.split.maxsize: 16777216 INFO 06-01 13:31:54,891 - Total input paths to process : 1 INFO 06-01 13:31:54,892 - pool-26-thread-58 Executors configured : 1 INFO 06-01 13:31:54,893 - pool-26-thread-58 Requesting total executors: 1 INFO 06-01 13:31:54,897 - pool-26-thread-58 Total Time taken to ensure the required executors : 3 INFO 06-01 13:31:54,897 - pool-26-thread-58 Time elapsed to allocate the required executors: 0 INFO 06-01 13:31:54,898 - pool-26-thread-58 Total Time taken in block allocation: 6 INFO 06-01 13:31:54,898 - pool-26-thread-58 Total no of blocks: 1, No.of Nodes: 1 INFO 06-01 13:31:54,898 - pool-26-thread-58 #Node: hadoop-slave-1 no.of.blocks: 1 , mismatch locations: ,knoldus INFO 06-01 13:31:55,057 - Block broadcast_62 stored as values in memory (estimated size 150.4 MB, free 300.0 MB) INFO 06-01 13:31:55,064 - Block broadcast_62_piece0 stored as bytes in memory (estimated size 19.7 KB, free 300.0 MB) INFO 06-01 13:31:55,064 - Added broadcast_62_piece0 in memory on 192.168.2.174:32778 (size: 19.7 KB, free: 511.0 MB) INFO 06-01 13:31:55,064 - Created broadcast 62 from broadcast at NewCarbonDataLoadRDD.scala:109 INFO 06-01 13:31:55,067 - Starting job: collect at CarbonDataRDDFactory.scala:632 INFO 06-01 13:31:55,067 - Got job 31 (collect at CarbonDataRDDFactory.scala:632) with 1 output partitions INFO 06-01 13:31:55,067 - Final stage: ResultStage 38 (collect at CarbonDataRDDFactory.scala:632) INFO 06-01 13:31:55,067 - Parents of final stage: List() INFO 06-01 13:31:55,067 - Missing parents: List() INFO 06-01 13:31:55,068 - Submitting ResultStage 38 (NewCarbonDataLoadRDD[150] at RDD at NewCarbonDataLoadRDD.scala:91), which has no missing parents INFO 06-01 13:31:55,068 - Preferred Location for split : hadoop-slave-1 INFO 06-01 13:31:55,069 - Block broadcast_63 stored as values in memory (estimated size 12.0 KB, free 300.0 MB) INFO 06-01 13:31:55,070 - Block broadcast_63_piece0 stored as bytes in memory (estimated size 5.8 KB, free 300.0 MB) INFO 06-01 13:31:55,070 - Added broadcast_63_piece0 in memory on 192.168.2.174:32778 (size: 5.8 KB, free: 511.0 MB) INFO 06-01 13:31:55,071 - Created broadcast 63 from broadcast at DAGScheduler.scala:1006 INFO 06-01 13:31:55,071 - Submitting 1 missing tasks from ResultStage 38 (NewCarbonDataLoadRDD[150] at RDD at NewCarbonDataLoadRDD.scala:91) INFO 06-01 13:31:55,071 - Adding task set 38.0 with 1 tasks INFO 06-01 13:31:55,072 - Starting task 0.0 in stage 38.0 (TID 92, hadoop-slave-1, partition 0,NODE_LOCAL, 2498 bytes) INFO 06-01 13:31:55,083 - Added broadcast_63_piece0 in memory on hadoop-slave-1:34995 (size: 5.8 KB, free: 511.0 MB) INFO 06-01 13:31:55,096 - Added broadcast_62_piece0 in memory on hadoop-slave-1:34995 (size: 19.7 KB, free: 511.0 MB) AUDIT 06-01 13:31:55,120 - [deepak-Vostro-3546][hduser][Thread-428]Connected org.apache.carbondata.core.dictionary.server.DictionaryServerHandler@7c9223ef INFO 06-01 13:31:56,510 - Finished task 0.0 in stage 38.0 (TID 92) in 1439 ms on hadoop-slave-1 (1/1) INFO 06-01 13:31:56,510 - Removed TaskSet 38.0, whose tasks have all completed, from pool INFO 06-01 13:31:56,510 - ResultStage 38 (collect at CarbonDataRDDFactory.scala:632) finished in 1.439 s INFO 06-01 13:31:56,510 - Job 31 finished: collect at CarbonDataRDDFactory.scala:632, took 1.443490 s INFO 06-01 13:31:56,511 - pool-26-thread-58 Acquired lock for tablemeradb.uniqdata_includedictionary for table status updation INFO 06-01 13:31:56,595 - pool-26-thread-58 Successfully deleted the lock file /tmp/meradb/uniqdata_includedictionary/tablestatus.lock INFO 06-01 13:31:56,595 - pool-26-thread-58 Table unlocked successfully after table status updationmeradb.uniqdata_includedictionary ERROR 06-01 13:31:56,595 - pool-26-thread-58 Error while close dictionary server and write dictionary file for meradb.uniqdata_includedictionary ERROR 06-01 13:31:56,595 - pool-26-thread-58 java.lang.Exception: Dataload failed due to error while write dictionary file! at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:773) at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:470) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130) at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:137) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:211) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:154) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:151) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:164) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) AUDIT 06-01 13:31:56,596 - [deepak-Vostro-3546][hduser][Thread-494]Dataload failure for meradb.uniqdata_includedictionary. Please check the logs INFO 06-01 13:31:56,596 - pool-26-thread-58 Successfully deleted the lock file /tmp/meradb/uniqdata_includedictionary/meta.lock INFO 06-01 13:31:56,596 - Table MetaData Unlocked Successfully after data load ERROR 06-01 13:31:56,597 - Error executing query, currentState RUNNING, java.lang.Exception: Dataload failed due to error while write dictionary file! at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:773) at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:470) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130) at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:137) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:211) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:154) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:151) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:164) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ERROR 06-01 13:31:56,597 - Error running hive query: org.apache.hive.service.cli.HiveSQLException: java.lang.Exception: Dataload failed due to error while write dictionary file! at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:246) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:154) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:151) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:164) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) |
Free forum by Nabble | Edit this page |