[jira] [Commented] (CARBONDATA-243) Filters on dictionary column not always work.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (CARBONDATA-243) Filters on dictionary column not always work.

Akash R Nilugal (Jira)

    [ https://issues.apache.org/jira/browse/CARBONDATA-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824367#comment-15824367 ]

Ravindra Pesala commented on CARBONDATA-243:
--------------------------------------------

I think this issue got resolved in latest master. Please verify it once

> Filters on dictionary column not always work.
> ---------------------------------------------
>
>                 Key: CARBONDATA-243
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-243
>             Project: CarbonData
>          Issue Type: Bug
>          Components: examples
>    Affects Versions: 0.2.0-incubating
>         Environment: Ubuntu 14.04
>            Reporter: Yue Shang
>            Priority: Minor
>
> Filter on dictionary column not always work.
> Table loaded contains 100 columns, 100M rows, and one of the column c100's cardinality is 10M. Dictionary include c100.
> DDL statement as follows:
> CREATE TABLE IF NOT EXISTS big_table4
>         (
>         c100 Int,
>         c1 Int, c2 Int, c3 Int, c4 Int, c5 Int, c6 Int, c7 Int, c8 Int, c9 Int, c10 Int, c11 Int, c12 Int, c13 Int, c14 Int, c15 Int, c16 Int, c17 Int, c18 Int, c19 Int, c20 Int,
>         c21 Int, c22 Int, c23 Int, c24 Int, c25 Int, c26 Int, c27 Int, c28 Int, c29 Int, c30 Int, c31 Int, c32 Int, c33 Int, c34 Int, c35 Int, c36 Int, c37 Int, c38 Int, c39 Int, c40 Int,
>         c41 Int, c42 Int, c43 Int, c44 Int, c45 Int, c46 Int, c47 Int, c48 Int, c49 Int, c50 Int, c51 Int, c52 Int, c53 Int, c54 Int, c55 Int, c56 Int, c57 Int, c58 Int, c59 Int, c60 Int,
>         c61 Int, c62 Int, c63 Int, c64 Int, c65 Int, c66 Int, c67 Int, c68 Int, c69 Int, c70 Int, c71 Int, c72 Int, c73 Int, c74 Int, c75 Int, c76 Int, c77 Int, c78 Int, c79 Int, c80 Int,
>         c81 Int, c82 Int, c83 Int, c84 Int, c85 Int, c86 Int, c87 Int, c88 Int, c89 Int, c90 Int, c91 Int, c92 Int, c93 Int, c94 Int, c95 Int, c96 Int, c97 Int, c98 Int, c99 Int,
>          c101 String
>         )
>         STORED BY 'carbondata'
>         TBLPROPERTIES ("DICTIONARY_INCLUDE"="c101,c100, c1, c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,
>                 c15")
> I tried this query to make sure the value i'm querying is exist.
> select c100 from big_table4 where c100 like '1234_'
> ​+-----+
> | c100|
> +-----+
> |12340|
> |12341|
> |12342|
> |12343|
> |12344|
> |12345|
> |12346|
> |12347|
> |12348|
> |12349|
> +-----+
> ​
> But when I tried: " select c100 from big_table4 where c100 like '12345'​  ", I got a runtime error(Log is at the bottom of this mail).
> But the most wield part is, some values can be queried while others not.
> cc.sql("select c100 from big_table4 where c100 like '1000016'")
> This query will get the exact correct answer.
> Any idea about this?
> ========================
>        log
> ========================
> ​ERROR 14-09 17:16:52,024 - [Executor task launch worker-0][partitionID:table4;queryID:275884499876481_0]
> java.lang.NullPointerException
> at org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultIterator.intialiseInfos(AbstractDetailQueryResultIterator.java:95)
> at org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultIterator.<init>(AbstractDetailQueryResultIterator.java:87)
> at org.apache.carbondata.scan.result.iterator.DetailQueryResultIterator.<init>(DetailQueryResultIterator.java:47)
> at org.apache.carbondata.scan.executor.impl.DetailQueryExecutor.execute(DetailQueryExecutor.java:39)
> at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.<init>(CarbonScanRDD.scala:193)
> at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:174)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> ERROR 14-09 17:16:52,026 - Exception in task 0.0 in stage 0.0 (TID 0)
> java.lang.RuntimeException: Exception occurred in query execution.Please check logs.
> at scala.sys.package$.error(package.scala:27)
> at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.<init>(CarbonScanRDD.scala:203)
> at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:174)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> ERROR 14-09 17:16:52,038 - Task 0 in stage 0.0 failed 1 times; aborting job
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.RuntimeException: Exception occurred in query execution.Please check logs.
> at scala.sys.package$.error(package.scala:27)
> at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.<init>(CarbonScanRDD.scala:203)
> at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:174)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
> at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
> at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
> at scala.Option.foreach(Option.scala:236)
> at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
> at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:215)
> at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:207)
> at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385)
> at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385)
> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
> at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1903)
> at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1384)
> at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1314)
> at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1377)
> at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:178)
> at org.apache.spark.sql.DataFrame.show(DataFrame.scala:401)
> at org.apache.spark.sql.DataFrame.show(DataFrame.scala:362)
> at org.apache.carbondata.examples.MyBigTest$.main(MyBigTest.scala:107)
> at org.apache.carbondata.examples.MyBigTest.main(MyBigTest.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
> Caused by: java.lang.RuntimeException: Exception occurred in query execution.Please check logs.
> at scala.sys.package$.error(package.scala:27)
> at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.<init>(CarbonScanRDD.scala:203)
> at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:174)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)​



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)