Vandana Yadav created CARBONDATA-2398:
----------------------------------------- Summary: Getting Error while executing concat function on complex data type. Key: CARBONDATA-2398 URL: https://issues.apache.org/jira/browse/CARBONDATA-2398 Project: CarbonData Issue Type: Bug Components: data-query Affects Versions: 1.4.0 Environment: spark 2.2 Reporter: Vandana Yadav Attachments: arrayofstruct.csv Getting Error while executing the concat function on the complex data type. Steps to Reproduce: 1) Create Table: create table ARRAY_OF_STRUCT_com (CUST_ID string, YEAR int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, ARRAY_OF_STRUCT array<struct<ID:int,COUNTRY:string,STATE:string,CITI:string,CHECK_DATE:timestamp>>,CARD_COUNT int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) STORED BY 'org.apache.carbondata.format' 2) Load data into the table: LOAD DATA INPATH 'HDFS_URL/BabuStore/Data/complex/arrayofstruct.csv' INTO table ARRAY_OF_STRUCT_com options ('DELIMITER'=',', 'QUOTECHAR'='"', 'FILEHEADER'='CUST_ID,YEAR,MONTH,AGE,GENDER,EDUCATED,IS_MARRIED,ARRAY_OF_STRUCT,CARD_COUNT,DEBIT_COUNT,CREDIT_COUNT,DEPOSIT,HQ_DEPOSIT','COMPLEX_DELIMITER_LEVEL_1'='$','COMPLEX_DELIMITER_LEVEL_2'='&') 3) Execute Query: select concat(array_of_struct.COUNTRY[2],'_',educated) as a from ARRAY_OF_STRUCT_com; 4) Expected Result: It should display correct result after applying concat function 5) Actual Result: Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 30.0 failed 1 times, most recent failure: Lost task 1.0 in stage 30.0 (TID 2078, localhost, executor driver): java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.sql.catalyst.util.ArrayData at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getArray(rows.scala:48) at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getArray(rows.scala:194) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 6) Error log: 18/04/25 11:21:49 INFO SparkExecuteStatementOperation: Running query 'select concat(array_of_struct.COUNTRY[2],'_',educated) as a from ARRAY_OF_STRUCT_com' with 30bea0e3-52ed-4979-b7c7-f9dd024b70e1 18/04/25 11:21:49 INFO CarbonSparkSqlParser: Parsing command: select concat(array_of_struct.COUNTRY[2],'_',educated) as a from ARRAY_OF_STRUCT_com 18/04/25 11:21:49 INFO HiveMetaStore: 21: get_database: bug 18/04/25 11:21:49 INFO audit: ugi=knoldus ip=unknown-ip-addr cmd=get_database: bug 18/04/25 11:21:49 INFO HiveMetaStore: 21: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 18/04/25 11:21:49 INFO ObjectStore: ObjectStore, initialize called 18/04/25 11:21:49 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing 18/04/25 11:21:49 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY 18/04/25 11:21:49 INFO ObjectStore: Initialized ObjectStore 18/04/25 11:21:49 INFO HiveMetaStore: 21: get_table : db=bug tbl=array_of_struct_com 18/04/25 11:21:49 INFO audit: ugi=knoldus ip=unknown-ip-addr cmd=get_table : db=bug tbl=array_of_struct_com 18/04/25 11:21:49 INFO CatalystSqlParser: Parsing command: array<string> 18/04/25 11:21:49 INFO CarbonLRUCache: pool-23-thread-20 Removed entry from InMemory lru cache :: hdfs://localhost:54310/opt/CarbonStore/bug/array_of_struct_com/Fact/Part0/Segment_0/0_batchno0-0-1524572335034.carbonindex 18/04/25 11:21:49 INFO CarbonLRUCache: pool-23-thread-20 Removed entry from InMemory lru cache :: hdfs://localhost:54310/opt/CarbonStore/bug/array_of_struct_com/Fact/Part0/Segment_1/0_batchno0-0-1524575558281.carbonindex 18/04/25 11:21:49 INFO HiveMetaStore: 21: get_table : db=bug tbl=array_of_struct_com 18/04/25 11:21:49 INFO audit: ugi=knoldus ip=unknown-ip-addr cmd=get_table : db=bug tbl=array_of_struct_com 18/04/25 11:21:49 INFO CatalystSqlParser: Parsing command: array<string> 18/04/25 11:21:49 INFO HiveMetaStore: 21: get_database: bug 18/04/25 11:21:49 INFO audit: ugi=knoldus ip=unknown-ip-addr cmd=get_database: bug 18/04/25 11:21:49 INFO HiveMetaStore: 21: get_database: bug 18/04/25 11:21:49 INFO audit: ugi=knoldus ip=unknown-ip-addr cmd=get_database: bug 18/04/25 11:21:49 INFO HiveMetaStore: 21: get_tables: db=bug pat=* 18/04/25 11:21:49 INFO audit: ugi=knoldus ip=unknown-ip-addr cmd=get_tables: db=bug pat=* 18/04/25 11:21:49 INFO TableInfo: pool-23-thread-20 Table block size not specified for bug_array_of_struct_com. Therefore considering the default value 1024 MB 18/04/25 11:21:49 INFO CarbonLateDecodeRule: pool-23-thread-20 skip CarbonOptimizer 18/04/25 11:21:49 INFO CarbonLateDecodeRule: pool-23-thread-20 Skip CarbonOptimizer 18/04/25 11:21:49 INFO TableInfo: pool-23-thread-20 Table block size not specified for bug_array_of_struct_com. Therefore considering the default value 1024 MB 18/04/25 11:21:49 INFO BlockletDataMap: pool-23-thread-20 Time taken to load blocklet datamap from file : hdfs://localhost:54310/opt/CarbonStore/bug/array_of_struct_com/Fact/Part0/Segment_0/0_batchno0-0-1524572335034.carbonindexis 1 18/04/25 11:21:49 INFO BlockletDataMap: pool-23-thread-20 Time taken to load blocklet datamap from file : hdfs://localhost:54310/opt/CarbonStore/bug/array_of_struct_com/Fact/Part0/Segment_1/0_batchno0-0-1524575558281.carbonindexis 2 18/04/25 11:21:49 INFO CarbonScanRDD: Identified no.of.blocks: 2, no.of.tasks: 2, no.of.nodes: 0, parallelism: 4 18/04/25 11:21:49 INFO SparkContext: Starting job: run at AccessController.java:0 18/04/25 11:21:49 INFO DAGScheduler: Got job 17 (run at AccessController.java:0) with 2 output partitions 18/04/25 11:21:49 INFO DAGScheduler: Final stage: ResultStage 30 (run at AccessController.java:0) 18/04/25 11:21:49 INFO DAGScheduler: Parents of final stage: List() 18/04/25 11:21:49 INFO DAGScheduler: Missing parents: List() 18/04/25 11:21:49 INFO DAGScheduler: Submitting ResultStage 30 (MapPartitionsRDD[83] at run at AccessController.java:0), which has no missing parents 18/04/25 11:21:49 INFO MemoryStore: Block broadcast_27 stored as values in memory (estimated size 32.6 KB, free 366.0 MB) 18/04/25 11:21:49 INFO MemoryStore: Block broadcast_27_piece0 stored as bytes in memory (estimated size 26.6 KB, free 366.0 MB) 18/04/25 11:21:49 INFO BlockManagerInfo: Added broadcast_27_piece0 in memory on 192.168.2.102:40679 (size: 26.6 KB, free: 366.2 MB) 18/04/25 11:21:49 INFO SparkContext: Created broadcast 27 from broadcast at DAGScheduler.scala:1006 18/04/25 11:21:49 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 30 (MapPartitionsRDD[83] at run at AccessController.java:0) (first 15 tasks are for partitions Vector(0, 1)) 18/04/25 11:21:49 INFO TaskSchedulerImpl: Adding task set 30.0 with 2 tasks 18/04/25 11:21:49 INFO TaskSetManager: Starting task 0.0 in stage 30.0 (TID 2077, localhost, executor driver, partition 0, ANY, 6524 bytes) 18/04/25 11:21:49 INFO TaskSetManager: Starting task 1.0 in stage 30.0 (TID 2078, localhost, executor driver, partition 1, ANY, 6534 bytes) 18/04/25 11:21:49 INFO Executor: Running task 0.0 in stage 30.0 (TID 2077) 18/04/25 11:21:49 INFO Executor: Running task 1.0 in stage 30.0 (TID 2078) 18/04/25 11:21:49 INFO TableInfo: Executor task launch worker for task 2077 Table block size not specified for bug_array_of_struct_com. Therefore considering the default value 1024 MB 18/04/25 11:21:49 INFO AbstractQueryExecutor: [Executor task launch worker for task 2077][partitionID:com;queryID:18618029041374] Query will be executed on table: array_of_struct_com 18/04/25 11:21:49 INFO ResultCollectorFactory: [Executor task launch worker for task 2077][partitionID:com;queryID:18618029041374] Row based dictionary collector is used to scan and collect the data 18/04/25 11:21:49 INFO TableInfo: Executor task launch worker for task 2078 Table block size not specified for bug_array_of_struct_com. Therefore considering the default value 1024 MB 18/04/25 11:21:49 INFO AbstractQueryExecutor: [Executor task launch worker for task 2078][partitionID:com;queryID:18618029041374] Query will be executed on table: array_of_struct_com 18/04/25 11:21:49 INFO ResultCollectorFactory: [Executor task launch worker for task 2078][partitionID:com;queryID:18618029041374] Restructure based dictionary collector is used to scan and collect the data 18/04/25 11:21:49 INFO UnsafeMemoryManager: [Executor task launch worker for task 2077][partitionID:com;queryID:18618029041374] Total memory used after task 18618132148717 is 13854 Current tasks running now are : [18271172672188, 17522539140626, 17607895858118, 18330821230360, 18405469228911, 18618097583871, 18394132241322, 18418328233121, 18431423923731, 18317037545688, 18368469767199, 18254776726806, 18307363580438, 18146866243005, 18385290912031] 18/04/25 11:21:49 INFO Executor: Finished task 0.0 in stage 30.0 (TID 2077). 1903 bytes result sent to driver 18/04/25 11:21:49 INFO UnsafeMemoryManager: [Executor task launch worker for task 2078][partitionID:com;queryID:18618029041374] Total memory used after task 18618151738867 is 13854 Current tasks running now are : [18271172672188, 17522539140626, 17607895858118, 18330821230360, 18405469228911, 18618097583871, 18394132241322, 18418328233121, 18431423923731, 18317037545688, 18368469767199, 18254776726806, 18307363580438, 18146866243005, 18385290912031] 18/04/25 11:21:49 ERROR Executor: Exception in task 1.0 in stage 30.0 (TID 2078) java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.sql.catalyst.util.ArrayData at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getArray(rows.scala:48) at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getArray(rows.scala:194) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 18/04/25 11:21:49 INFO TaskSetManager: Finished task 0.0 in stage 30.0 (TID 2077) in 100 ms on localhost (executor driver) (1/2) -- This message was sent by Atlassian JIRA (v7.6.3#76005) |
Free forum by Nabble | Edit this page |