This post was updated on .
Hi Community,
I found some possible issues about dictionary and S3 compatibility during POC, and I attach them in CSV, could you please have a look at it? Thanks Aaron Possible_Issues.csv <http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t357/Possible_Issues.csv> -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
One typo fix, spark version of No 2 should be 2.2.1 Pre built with hadoop
2.7.2 -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi aaron,
Thank you for reporting the issues. Let me have a look into these. I will reply ASAP. Thanks Kunal Kapoor On Mon, Sep 24, 2018 at 9:07 AM aaron <[hidden email]> wrote: > One typo fix, spark version of No 2 should be 2.2.1 Pre built with hadoop > 2.7.2 > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Hi kunalkapoor,
Thanks very much for your quick response. And I care about below issue mostly, because it would impact our implementation a lot. For issue "Dictionary decoding does not work when the dictionary column used for filter/join on preaggregate(timeseries) table", I have tested those combinations when worker are distributed in different machines, but all of them behaves same - raise exception like "Caused by: java.lang.RuntimeException: Error while resolving filter expression". 1. carbondata1.4.1 & spark2.2.1 2. carbondata1.5.0-SNAPSHOT & spark2.2.1 3. carbondata1.5.0-SNAPSHOT & spark2.2.2 4. carbondata1.5.0-SNAPSHOT & spark2.3.1 We would use many preaggregate tables in our business, and filter & join would be very common cases for us. Looking forward to your good news. Thanks Aaron -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi kunalkapoor,
More info for you. *1. One comment about how to reproduce this *- query was distributed to spark workers on different nodes for execution. *2. Detailed stacktrace* scala> carbon.time(carbon.sql( | s"""SELECT sum(est_free_app_download), timeseries(date, 'MONTH'), country_code | |FROM store WHERE market_code='apple-store' and device_code='ios-phone' and country_code IN ('US', 'CN') | |GROUP BY timeseries(date, 'MONTH'), market_code, device_code, country_code, category_id""".stripMargin).show(truncate=false)) 18/09/23 23:42:42 AUDIT CacheProvider: [ec2-dca-aa-p-sdn-16.appannie.org][hadoop][Thread-1]The key carbon.query.directQueryOnDataMap.enabled with value true added in the session param [Stage 0:> (0 + 2) / 2]18/09/23 23:42:46 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 10.2.3.19, executor 1): java.lang.RuntimeException: Error while resolving filter expression at org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1043) at org.apache.carbondata.core.scan.model.QueryModelBuilder.build(QueryModelBuilder.java:322) at org.apache.carbondata.hadoop.api.CarbonInputFormat.createQueryModel(CarbonInputFormat.java:632) at org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:419) at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at org.apache.carbondata.core.scan.executor.util.QueryUtil.getTableIdentifierForColumn(QueryUtil.java:401) at org.apache.carbondata.core.scan.filter.FilterUtil.getForwardDictionaryCache(FilterUtil.java:1416) at org.apache.carbondata.core.scan.filter.FilterUtil.getFilterValues(FilterUtil.java:712) at org.apache.carbondata.core.scan.filter.resolver.resolverinfo.visitor.DictionaryColumnVisitor.populateFilterResolvedInfo(DictionaryColumnVisitor.java:60) at org.apache.carbondata.core.scan.filter.resolver.resolverinfo.DimColumnResolvedFilterInfo.populateFilterInfoBasedOnColumnType(DimColumnResolvedFilterInfo.java:119) at org.apache.carbondata.core.scan.filter.resolver.ConditionalFilterResolverImpl.resolve(ConditionalFilterResolverImpl.java:107) at org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:255) at org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) at org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) at org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) at org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) at org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolvertree(FilterExpressionProcessor.java:235) at org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolver(FilterExpressionProcessor.java:84) at org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1041) ... 19 more 18/09/23 23:42:48 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 7, 10.2.3.19, executor 1): java.lang.RuntimeException: Error while resolving filter expression at org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1043) at org.apache.carbondata.core.scan.model.QueryModelBuilder.build(QueryModelBuilder.java:322) at org.apache.carbondata.hadoop.api.CarbonInputFormat.createQueryModel(CarbonInputFormat.java:632) at org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:419) at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at org.apache.carbondata.core.scan.executor.util.QueryUtil.getTableIdentifierForColumn(QueryUtil.java:401) at org.apache.carbondata.core.scan.filter.FilterUtil.getForwardDictionaryCache(FilterUtil.java:1416) at org.apache.carbondata.core.scan.filter.FilterUtil.getFilterValues(FilterUtil.java:712) at org.apache.carbondata.core.scan.filter.resolver.resolverinfo.visitor.DictionaryColumnVisitor.populateFilterResolvedInfo(DictionaryColumnVisitor.java:60) at org.apache.carbondata.core.scan.filter.resolver.resolverinfo.DimColumnResolvedFilterInfo.populateFilterInfoBasedOnColumnType(DimColumnResolvedFilterInfo.java:119) at org.apache.carbondata.core.scan.filter.resolver.ConditionalFilterResolverImpl.resolve(ConditionalFilterResolverImpl.java:107) at org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:255) at org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) at org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) at org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) at org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) at org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolvertree(FilterExpressionProcessor.java:235) at org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolver(FilterExpressionProcessor.java:84) at org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1041) ... 19 more Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1533) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1521) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1520) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1520) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1748) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1703) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1692) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2029) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2050) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2069) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:336) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:2865) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2154) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2154) at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:2846) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2845) at org.apache.spark.sql.Dataset.head(Dataset.scala:2154) at org.apache.spark.sql.Dataset.take(Dataset.scala:2367) at org.apache.spark.sql.Dataset.showString(Dataset.scala:241) at org.apache.spark.sql.Dataset.show(Dataset.scala:643) at org.apache.spark.sql.Dataset.show(Dataset.scala:620) at $anonfun$1.apply$mcV$sp(<console>:37) at $anonfun$1.apply(<console>:37) at $anonfun$1.apply(<console>:37) at org.apache.spark.sql.SparkSession.time(SparkSession.scala:667) ... 51 elided Caused by: java.lang.RuntimeException: Error while resolving filter expression at org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1043) at org.apache.carbondata.core.scan.model.QueryModelBuilder.build(QueryModelBuilder.java:322) at org.apache.carbondata.hadoop.api.CarbonInputFormat.createQueryModel(CarbonInputFormat.java:632) at org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:419) at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at org.apache.carbondata.core.scan.executor.util.QueryUtil.getTableIdentifierForColumn(QueryUtil.java:401) at org.apache.carbondata.core.scan.filter.FilterUtil.getForwardDictionaryCache(FilterUtil.java:1416) at org.apache.carbondata.core.scan.filter.FilterUtil.getFilterValues(FilterUtil.java:712) at org.apache.carbondata.core.scan.filter.resolver.resolverinfo.visitor.DictionaryColumnVisitor.populateFilterResolvedInfo(DictionaryColumnVisitor.java:60) at org.apache.carbondata.core.scan.filter.resolver.resolverinfo.DimColumnResolvedFilterInfo.populateFilterInfoBasedOnColumnType(DimColumnResolvedFilterInfo.java:119) at org.apache.carbondata.core.scan.filter.resolver.ConditionalFilterResolverImpl.resolve(ConditionalFilterResolverImpl.java:107) at org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:255) at org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) at org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) at org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) at org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) at org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolvertree(FilterExpressionProcessor.java:235) at org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolver(FilterExpressionProcessor.java:84) at org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1041) ... 19 more -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi aaron,
I was able to reproduce Issue1 and it is a bug in the current implementation of the code. This issue is not related to S3, I was able to reproduce the same on HDFS as well. I have created a JIRA, you can track it using the following link https://issues.apache.org/jira/browse/CARBONDATA-2967 Thanks Kunal Kapoor On Mon, Sep 24, 2018 at 12:56 PM aaron <[hidden email]> wrote: > Hi kunalkapoor, > > More info for you. > > *1. One comment about how to reproduce this *- query was distributed to > spark workers on different nodes for execution. > > *2. Detailed stacktrace* > > scala> carbon.time(carbon.sql( > | s"""SELECT sum(est_free_app_download), timeseries(date, > 'MONTH'), country_code > | |FROM store WHERE market_code='apple-store' and > device_code='ios-phone' and country_code IN ('US', 'CN') > | |GROUP BY timeseries(date, 'MONTH'), market_code, > device_code, country_code, > category_id""".stripMargin).show(truncate=false)) > 18/09/23 23:42:42 AUDIT CacheProvider: > [ec2-dca-aa-p-sdn-16.appannie.org][hadoop][Thread-1]The key > carbon.query.directQueryOnDataMap.enabled with value true added in the > session param > [Stage 0:> (0 + 2) > / 2]18/09/23 23:42:46 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID > 1, 10.2.3.19, executor 1): java.lang.RuntimeException: Error while > resolving > filter expression > at > > org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1043) > at > > org.apache.carbondata.core.scan.model.QueryModelBuilder.build(QueryModelBuilder.java:322) > at > > org.apache.carbondata.hadoop.api.CarbonInputFormat.createQueryModel(CarbonInputFormat.java:632) > at > > org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:419) > at > org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > > org.apache.carbondata.core.scan.executor.util.QueryUtil.getTableIdentifierForColumn(QueryUtil.java:401) > at > > org.apache.carbondata.core.scan.filter.FilterUtil.getForwardDictionaryCache(FilterUtil.java:1416) > at > > org.apache.carbondata.core.scan.filter.FilterUtil.getFilterValues(FilterUtil.java:712) > at > > org.apache.carbondata.core.scan.filter.resolver.resolverinfo.visitor.DictionaryColumnVisitor.populateFilterResolvedInfo(DictionaryColumnVisitor.java:60) > at > > org.apache.carbondata.core.scan.filter.resolver.resolverinfo.DimColumnResolvedFilterInfo.populateFilterInfoBasedOnColumnType(DimColumnResolvedFilterInfo.java:119) > at > > org.apache.carbondata.core.scan.filter.resolver.ConditionalFilterResolverImpl.resolve(ConditionalFilterResolverImpl.java:107) > at > > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:255) > at > > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) > at > > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) > at > > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) > at > > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) > at > > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolvertree(FilterExpressionProcessor.java:235) > at > > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolver(FilterExpressionProcessor.java:84) > at > > org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1041) > ... 19 more > > 18/09/23 23:42:48 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; > aborting job > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 7, 10.2.3.19, executor 1): java.lang.RuntimeException: Error while > resolving filter expression > at > > org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1043) > at > > org.apache.carbondata.core.scan.model.QueryModelBuilder.build(QueryModelBuilder.java:322) > at > > org.apache.carbondata.hadoop.api.CarbonInputFormat.createQueryModel(CarbonInputFormat.java:632) > at > > org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:419) > at > org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > > org.apache.carbondata.core.scan.executor.util.QueryUtil.getTableIdentifierForColumn(QueryUtil.java:401) > at > > org.apache.carbondata.core.scan.filter.FilterUtil.getForwardDictionaryCache(FilterUtil.java:1416) > at > > org.apache.carbondata.core.scan.filter.FilterUtil.getFilterValues(FilterUtil.java:712) > at > > org.apache.carbondata.core.scan.filter.resolver.resolverinfo.visitor.DictionaryColumnVisitor.populateFilterResolvedInfo(DictionaryColumnVisitor.java:60) > at > > org.apache.carbondata.core.scan.filter.resolver.resolverinfo.DimColumnResolvedFilterInfo.populateFilterInfoBasedOnColumnType(DimColumnResolvedFilterInfo.java:119) > at > > org.apache.carbondata.core.scan.filter.resolver.ConditionalFilterResolverImpl.resolve(ConditionalFilterResolverImpl.java:107) > at > > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:255) > at > > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) > at > > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) > at > > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) > at > > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) > at > > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolvertree(FilterExpressionProcessor.java:235) > at > > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolver(FilterExpressionProcessor.java:84) > at > > org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1041) > ... 19 more > > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1533) > at > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1521) > at > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1520) > at > > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1520) > at > > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814) > at > > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814) > at scala.Option.foreach(Option.scala:257) > at > > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814) > at > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1748) > at > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1703) > at > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1692) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2029) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2050) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2069) > at > org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:336) > at > > org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) > at > org.apache.spark.sql.Dataset.org > $apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:2865) > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2154) > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2154) > at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:2846) > at > > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2845) > at org.apache.spark.sql.Dataset.head(Dataset.scala:2154) > at org.apache.spark.sql.Dataset.take(Dataset.scala:2367) > at org.apache.spark.sql.Dataset.showString(Dataset.scala:241) > at org.apache.spark.sql.Dataset.show(Dataset.scala:643) > at org.apache.spark.sql.Dataset.show(Dataset.scala:620) > at $anonfun$1.apply$mcV$sp(<console>:37) > at $anonfun$1.apply(<console>:37) > at $anonfun$1.apply(<console>:37) > at org.apache.spark.sql.SparkSession.time(SparkSession.scala:667) > ... 51 elided > Caused by: java.lang.RuntimeException: Error while resolving filter > expression > at > > org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1043) > at > > org.apache.carbondata.core.scan.model.QueryModelBuilder.build(QueryModelBuilder.java:322) > at > > org.apache.carbondata.hadoop.api.CarbonInputFormat.createQueryModel(CarbonInputFormat.java:632) > at > > org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:419) > at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > > org.apache.carbondata.core.scan.executor.util.QueryUtil.getTableIdentifierForColumn(QueryUtil.java:401) > at > > org.apache.carbondata.core.scan.filter.FilterUtil.getForwardDictionaryCache(FilterUtil.java:1416) > at > > org.apache.carbondata.core.scan.filter.FilterUtil.getFilterValues(FilterUtil.java:712) > at > > org.apache.carbondata.core.scan.filter.resolver.resolverinfo.visitor.DictionaryColumnVisitor.populateFilterResolvedInfo(DictionaryColumnVisitor.java:60) > at > > org.apache.carbondata.core.scan.filter.resolver.resolverinfo.DimColumnResolvedFilterInfo.populateFilterInfoBasedOnColumnType(DimColumnResolvedFilterInfo.java:119) > at > > org.apache.carbondata.core.scan.filter.resolver.ConditionalFilterResolverImpl.resolve(ConditionalFilterResolverImpl.java:107) > at > > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:255) > at > > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) > at > > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) > at > > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) > at > > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254) > at > > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolvertree(FilterExpressionProcessor.java:235) > at > > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolver(FilterExpressionProcessor.java:84) > at > > org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1041) > ... 19 more > > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Hi kunalkapoor,
Thanks very much for your quick response! 1. For the global dictionary issue, Do you have rough plan about the fix? 2. How's the local dictionary bug on spark 2.3.1? Looking forward to the fix! Thanks Aaron -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi aaron,
1. I have already started working on the issue. It requires some refactoring as well. 2. I am trying to reproduce this issue. Will update you soon Thanks On Tue, Sep 25, 2018 at 3:55 PM aaron <[hidden email]> wrote: > Hi kunalkapoor, > > Thanks very much for your quick response! > > 1. For the global dictionary issue, Do you have rough plan about the fix? > 2. How's the local dictionary bug on spark 2.3.1? > > Looking forward to the fix! > > Thanks > Aaron > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Thanks a lot! Looking forward to your good news.
-- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi aaron,
For Issue2 can you cherry-pick https://github.com/apache/carbondata/pull/2761 and try. I think it should solve your problem. Thanks On Tue, Sep 25, 2018 at 8:34 PM aaron <[hidden email]> wrote: > Thanks a lot! Looking forward to your good news. > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Thanks, I've check already and it works well! Very impressive quick response
! -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi aaron
I have raised a PR for issue1 Can you cherry-pick the below commit and try!!! https://github.com/apache/carbondata/pull/2786 Thanks Kunal Kapoor On Wed, Sep 26, 2018, 8:31 PM aaron <[hidden email]> wrote: > Thanks, I've check already and it works well! Very impressive quick > response > ! > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Wow, cool! I will have a try!
-- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
In reply to this post by kunalkapoor
It works, thanks a lot!
-- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Free forum by Nabble | Edit this page |