Apache CarbonData Dev Mailing List archive

回复： query on string type return error

Posted by 喜之郎 on Apr 17, 2018; 1:00pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/query-on-string-type-return-error-tp44531p45531.html

I use apache-carbondata-1.3.1-bin-spark2.2.1-hadoop2.7.2.jar, which is downloaded from website. I did not build myself.
So I don't know thrift version. I don't update carbon version.

------------------ 原始邮件 ------------------
发件人: "xuchuanyin"<[hidden email]>;
发送时间: 2018年4月16日(星期一) 晚上7:04
收件人: "carbondata"<[hidden email]>;

主题: Re: query on string type return error

I think the problem may be metadata related. What's your thrift version？ Have you update carbon version recently after the data is loaded？ FROM MOBILE EMAIL CLIENT On 04/16/2018 15:51, Liang Chen wrote: Hi From the log message, seems like can't find the data files. Can you provide more detail info : 1. How you created carbonsession and how loaded data. 2. Have you deployed cluster or only single machine? Regards Liang 喜之郎 wrote > hi all, when I use carbondata to run a query "select count(*) from > action_carbondata where starttimestr = 20180301;", then an error occurs. > This is the error info: > ################### > 0: jdbc:hive2://localhost:10000> select count(*) from action_carbondata > where starttimestr = 20180301; > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 12 in stage 7.0 failed 4 times, most recent failure: Lost task 12.3 > in stage 7.0 (TID 173, sz-pg-entanalytics-research-001.tendcloud.com, > executor 1): org.apache.spark.util.TaskCompletionListenerException: > org.apache.carbondata.core.scan.executor.exception.QueryExecutionException: > > > Previous exception in task: java.util.concurrent.ExecutionException: > java.util.concurrent.ExecutionException: java.io.IOException: > org.apache.thrift.protocol.TProtocolException: Required field > 'data_chunk_list' was not present! Struct: > DataChunk3(data_chunk_list:null) > > org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.updateScanner(AbstractDataBlockIterator.java:136) > > org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.processNextBatch(DataBlockIteratorImpl.java:64) > > org.apache.carbondata.core.scan.result.iterator.VectorDetailQueryResultIterator.processNextBatch(VectorDetailQueryResultIterator.java:46) > > org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextBatch(VectorizedCarbonRecordReader.java:283) > > org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextKeyValue(VectorizedCarbonRecordReader.java:171) > > org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:391) > > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown > Source) > > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithoutKey$(Unknown > Source) > > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > org.apache.spark.scheduler.Task.run(Task.scala:108) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > at > org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138) > at > org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116) > at org.apache.spark.scheduler.Task.run(Task.scala:118) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > > Driver stacktrace: (state=,code=0) > > ################### > > > create table statement: > CREATE TABLE action_carbondata( > cur_appversioncode integer, > cur_appversionname integer, > cur_browserid integer, > cur_carrierid integer, > cur_channelid integer, > cur_cityid integer, > cur_countryid integer, > cur_ip string, > cur_networkid integer, > cur_osid integer, > cur_provinceid integer, > deviceproductoffset long, > duration integer, > eventcount integer, > eventlabelid integer, > eventtypeid integer, > organizationid integer, > platformid integer, > productid integer, > relatedaccountproductoffset long, > sessionduration integer, > sessionid string, > sessionstarttime long, > sessionstatus integer, > sourceid integer, > starttime long, > starttimestr string ) > partitioned by (eventid int) > STORED BY 'carbondata' > TBLPROPERTIES ('partition_type'='Hash','NUM_PARTITIONS'='39', > 'SORT_COLUMNS'='productid,sourceid,starttimestr,platformid,organizationid,eventtypeid,eventlabelid,cur_channelid,cur_provinceid,cur_countryid,cur_cityid,cur_osid,cur_appversioncode,cur_appversionname,cur_carrierid,cur_networkid,cur_browserid,sessionstatus,cur_ip'); > > > > The value of "starttimestr" field: > 20180303 > 20180304. > > > > > any advice is appreciated! > > > > > > the carbondata version is : > apache-carbondata-1.3.1-bin-spark2.2.1-hadoop2.7.2.jar > > > spark version is : > spark-2.2.1-bin-hadoop2.7 -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/