query err 'NullPointerException' but fine after table cached in memory
Posted by Li Peng on Jan 12, 2017; 5:59am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/query-err-NullPointerException-but-fine-after-table-cached-in-memory-tp6032.html
Hello,
use carbondata 0.2.0, following is the problem:
Only one column 'store_id' throws NullPointerException when query, but it works fine when some value or table is cached in memory.
store_id's type is int, cardinality is 200 Thousand, is configured about dictionary and inverted index.
sql:
select order_code,saletype,checkout_date,cashier_code,item_cont,invoice_price,giveamt,saleamt from store.sale where store_id=299998
error:
ERROR 12-01 10:40:16,861 - [Executor task launch worker-0][partitionID:sale;queryID:1438806645368420_0]
java.lang.NullPointerException
at org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultIterator.intialiseInfos(AbstractDetailQueryResultIterator.java:117)
at org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultIterator.<init>(AbstractDetailQueryResultIterator.java:107)
at org.apache.carbondata.scan.result.iterator.DetailQueryResultIterator.<init>(DetailQueryResultIterator.java:43)
at org.apache.carbondata.scan.executor.impl.DetailQueryExecutor.execute(DetailQueryExecutor.java:39)
at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.<init>(CarbonScanRDD.scala:216)
at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:192)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
ERROR 12-01 10:40:16,874 - Exception in task 0.1 in stage 0.0 (TID 1)
java.lang.RuntimeException: Exception occurred in query execution.Please check logs.
at scala.sys.package$.error(package.scala:27)
at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.<init>(CarbonScanRDD.scala:226)
at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:192)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
-------------------------------------------------------------------------------------------------------------------------------
Same err 'NullPointerException' in following sql:
select * from store.sale where store_id=100000
select * from store.sale where store_id=100001
select * from store.sale where store_id=100002
select * from store.sale where store_id=100006
select * from store.sale where store_id=100011
select * from store.sale where store_id=299999
But fine and can return results in following sql:
select * from store.sale where store_id=100008
select * from store.sale where store_id=100009
select * from store.sale where store_id=100010
select * from store.sale where store_id=100013
select * from store.sale where store_id=100027
select * from store.sale limit 10
select count(*) from store.sale
select * from store.sale where store_id=100005
select count(*) from store.sale where store_id=100005
select distinct(store_id) from store.sale order by store_id
But all work fine when table is cached to memory, not throwing 'NullPointerException'.