Apache CarbonData Dev Mailing List archive

query on string type return error

Posted by 喜之郎 on Apr 08, 2018; 6:23am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/query-on-string-type-return-error-tp44531.html

hi all, when I use carbondata to run a query "select count(*) from action_carbondata where starttimestr = 20180301;", then an error occurs. This is the error info:
###################
0: jdbc:hive2://localhost:10000> select count(*) from action_carbondata where starttimestr = 20180301;
Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 12 in stage 7.0 failed 4 times, most recent failure: Lost task 12.3 in stage 7.0 (TID 173, sz-pg-entanalytics-research-001.tendcloud.com, executor 1): org.apache.spark.util.TaskCompletionListenerException: org.apache.carbondata.core.scan.executor.exception.QueryExecutionException:

Previous exception in task: java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: java.io.IOException: org.apache.thrift.protocol.TProtocolException: Required field 'data_chunk_list' was not present! Struct: DataChunk3(data_chunk_list:null)
org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.updateScanner(AbstractDataBlockIterator.java:136)
org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.processNextBatch(DataBlockIteratorImpl.java:64)
org.apache.carbondata.core.scan.result.iterator.VectorDetailQueryResultIterator.processNextBatch(VectorDetailQueryResultIterator.java:46)
org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextBatch(VectorizedCarbonRecordReader.java:283)
org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextKeyValue(VectorizedCarbonRecordReader.java:171)
org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:391)
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source)
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithoutKey$(Unknown Source)
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
org.apache.spark.scheduler.Task.run(Task.scala:108)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116)
at org.apache.spark.scheduler.Task.run(Task.scala:118)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace: (state=,code=0)

###################

create table statement:
CREATE TABLE action_carbondata(
cur_appversioncode integer,
cur_appversionname integer,
cur_browserid integer,
cur_carrierid integer,
cur_channelid integer,
cur_cityid integer,
cur_countryid integer,
cur_ip string,
cur_networkid integer,
cur_osid integer,
cur_provinceid integer,
deviceproductoffset long,
duration integer,
eventcount integer,
eventlabelid integer,
eventtypeid integer,
organizationid integer,
platformid integer,
productid integer,
relatedaccountproductoffset long,
sessionduration integer,
sessionid string,
sessionstarttime long,
sessionstatus integer,
sourceid integer,
starttime long,
starttimestr string )
partitioned by (eventid int)
STORED BY 'carbondata'
TBLPROPERTIES ('partition_type'='Hash','NUM_PARTITIONS'='39',
'SORT_COLUMNS'='productid,sourceid,starttimestr,platformid,organizationid,eventtypeid,eventlabelid,cur_channelid,cur_provinceid,cur_countryid,cur_cityid,cur_osid,cur_appversioncode,cur_appversionname,cur_carrierid,cur_networkid,cur_browserid,sessionstatus,cur_ip');

The value of "starttimestr" field:
20180303
20180304.

any advice is appreciated!

the carbondata version is :
apache-carbondata-1.3.1-bin-spark2.2.1-hadoop2.7.2.jar

spark version is :
spark-2.2.1-bin-hadoop2.7