[Issue] Dictionary and S3

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

[Issue] Dictionary and S3

aaron
This post was updated on .
Hi Community,

I found some possible issues about dictionary and S3 compatibility during
POC, and I attach them in CSV, could you please have a look at it?


Thanks
Aaron Possible_Issues.csv
<http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t357/Possible_Issues.csv



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Issues about dictionary and S3

aaron
One typo fix,  spark version of No 2 should be 2.2.1 Pre built with hadoop
2.7.2



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Issues about dictionary and S3

kunalkapoor
Hi aaron,
Thank you for reporting the issues. Let me have a look into these.
I will reply ASAP.


Thanks
Kunal Kapoor

On Mon, Sep 24, 2018 at 9:07 AM aaron <[hidden email]> wrote:

> One typo fix,  spark version of No 2 should be 2.2.1 Pre built with hadoop
> 2.7.2
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: Issues about dictionary and S3

aaron
Hi kunalkapoor,

Thanks very much for your quick response. And I care about below issue
mostly, because it would impact our implementation a lot.

For issue "Dictionary decoding does not work when the  dictionary column
used for filter/join on preaggregate(timeseries)  table", I have tested
those combinations when worker are distributed in different machines, but
all of them behaves same - raise exception like "Caused by:
java.lang.RuntimeException: Error while resolving filter expression".

1. carbondata1.4.1 & spark2.2.1
2. carbondata1.5.0-SNAPSHOT & spark2.2.1
3. carbondata1.5.0-SNAPSHOT & spark2.2.2
4. carbondata1.5.0-SNAPSHOT & spark2.3.1

We would use many preaggregate tables in our business, and filter & join
would be very common cases for us.

Looking forward to your good news.

Thanks
Aaron

 




--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Issues about dictionary and S3

aaron
Hi kunalkapoor,

More info for you.

*1. One comment about how to reproduce this *- query was distributed to
spark workers on different nodes for execution.

*2. Detailed stacktrace*

scala> carbon.time(carbon.sql(
     |       s"""SELECT sum(est_free_app_download), timeseries(date,
'MONTH'), country_code
     |          |FROM store WHERE market_code='apple-store' and
device_code='ios-phone' and country_code IN ('US', 'CN')
     |          |GROUP BY timeseries(date, 'MONTH'), market_code,
device_code, country_code, category_id""".stripMargin).show(truncate=false))
18/09/23 23:42:42 AUDIT CacheProvider:
[ec2-dca-aa-p-sdn-16.appannie.org][hadoop][Thread-1]The key
carbon.query.directQueryOnDataMap.enabled with value true added in the
session param
[Stage 0:>                                                          (0 + 2)
/ 2]18/09/23 23:42:46 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID
1, 10.2.3.19, executor 1): java.lang.RuntimeException: Error while resolving
filter expression
        at
org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1043)
        at
org.apache.carbondata.core.scan.model.QueryModelBuilder.build(QueryModelBuilder.java:322)
        at
org.apache.carbondata.hadoop.api.CarbonInputFormat.createQueryModel(CarbonInputFormat.java:632)
        at
org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:419)
        at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
        at
org.apache.carbondata.core.scan.executor.util.QueryUtil.getTableIdentifierForColumn(QueryUtil.java:401)
        at
org.apache.carbondata.core.scan.filter.FilterUtil.getForwardDictionaryCache(FilterUtil.java:1416)
        at
org.apache.carbondata.core.scan.filter.FilterUtil.getFilterValues(FilterUtil.java:712)
        at
org.apache.carbondata.core.scan.filter.resolver.resolverinfo.visitor.DictionaryColumnVisitor.populateFilterResolvedInfo(DictionaryColumnVisitor.java:60)
        at
org.apache.carbondata.core.scan.filter.resolver.resolverinfo.DimColumnResolvedFilterInfo.populateFilterInfoBasedOnColumnType(DimColumnResolvedFilterInfo.java:119)
        at
org.apache.carbondata.core.scan.filter.resolver.ConditionalFilterResolverImpl.resolve(ConditionalFilterResolverImpl.java:107)
        at
org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:255)
        at
org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
        at
org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
        at
org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
        at
org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
        at
org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolvertree(FilterExpressionProcessor.java:235)
        at
org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolver(FilterExpressionProcessor.java:84)
        at
org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1041)
        ... 19 more

18/09/23 23:42:48 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times;
aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0
(TID 7, 10.2.3.19, executor 1): java.lang.RuntimeException: Error while
resolving filter expression
        at
org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1043)
        at
org.apache.carbondata.core.scan.model.QueryModelBuilder.build(QueryModelBuilder.java:322)
        at
org.apache.carbondata.hadoop.api.CarbonInputFormat.createQueryModel(CarbonInputFormat.java:632)
        at
org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:419)
        at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
        at
org.apache.carbondata.core.scan.executor.util.QueryUtil.getTableIdentifierForColumn(QueryUtil.java:401)
        at
org.apache.carbondata.core.scan.filter.FilterUtil.getForwardDictionaryCache(FilterUtil.java:1416)
        at
org.apache.carbondata.core.scan.filter.FilterUtil.getFilterValues(FilterUtil.java:712)
        at
org.apache.carbondata.core.scan.filter.resolver.resolverinfo.visitor.DictionaryColumnVisitor.populateFilterResolvedInfo(DictionaryColumnVisitor.java:60)
        at
org.apache.carbondata.core.scan.filter.resolver.resolverinfo.DimColumnResolvedFilterInfo.populateFilterInfoBasedOnColumnType(DimColumnResolvedFilterInfo.java:119)
        at
org.apache.carbondata.core.scan.filter.resolver.ConditionalFilterResolverImpl.resolve(ConditionalFilterResolverImpl.java:107)
        at
org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:255)
        at
org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
        at
org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
        at
org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
        at
org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
        at
org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolvertree(FilterExpressionProcessor.java:235)
        at
org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolver(FilterExpressionProcessor.java:84)
        at
org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1041)
        ... 19 more

Driver stacktrace:
  at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1533)
  at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1521)
  at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1520)
  at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1520)
  at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
  at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
  at scala.Option.foreach(Option.scala:257)
  at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
  at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1748)
  at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1703)
  at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1692)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2029)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2050)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2069)
  at
org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:336)
  at
org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
  at
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:2865)
  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2154)
  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2154)
  at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:2846)
  at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2845)
  at org.apache.spark.sql.Dataset.head(Dataset.scala:2154)
  at org.apache.spark.sql.Dataset.take(Dataset.scala:2367)
  at org.apache.spark.sql.Dataset.showString(Dataset.scala:241)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:643)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:620)
  at $anonfun$1.apply$mcV$sp(<console>:37)
  at $anonfun$1.apply(<console>:37)
  at $anonfun$1.apply(<console>:37)
  at org.apache.spark.sql.SparkSession.time(SparkSession.scala:667)
  ... 51 elided
Caused by: java.lang.RuntimeException: Error while resolving filter
expression
  at
org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1043)
  at
org.apache.carbondata.core.scan.model.QueryModelBuilder.build(QueryModelBuilder.java:322)
  at
org.apache.carbondata.hadoop.api.CarbonInputFormat.createQueryModel(CarbonInputFormat.java:632)
  at
org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:419)
  at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
  at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
  at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
  at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
  at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
  at org.apache.spark.scheduler.Task.run(Task.scala:109)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
  at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
  at
org.apache.carbondata.core.scan.executor.util.QueryUtil.getTableIdentifierForColumn(QueryUtil.java:401)
  at
org.apache.carbondata.core.scan.filter.FilterUtil.getForwardDictionaryCache(FilterUtil.java:1416)
  at
org.apache.carbondata.core.scan.filter.FilterUtil.getFilterValues(FilterUtil.java:712)
  at
org.apache.carbondata.core.scan.filter.resolver.resolverinfo.visitor.DictionaryColumnVisitor.populateFilterResolvedInfo(DictionaryColumnVisitor.java:60)
  at
org.apache.carbondata.core.scan.filter.resolver.resolverinfo.DimColumnResolvedFilterInfo.populateFilterInfoBasedOnColumnType(DimColumnResolvedFilterInfo.java:119)
  at
org.apache.carbondata.core.scan.filter.resolver.ConditionalFilterResolverImpl.resolve(ConditionalFilterResolverImpl.java:107)
  at
org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:255)
  at
org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
  at
org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
  at
org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
  at
org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
  at
org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolvertree(FilterExpressionProcessor.java:235)
  at
org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolver(FilterExpressionProcessor.java:84)
  at
org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1041)
  ... 19 more




--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Issues about dictionary and S3

kunalkapoor
Hi aaron,
I was able to reproduce Issue1 and it is a bug in the current
implementation of the code. This issue is not related to S3, I was able to
reproduce the same on HDFS as well.

I have created a JIRA, you can track it using the following link
https://issues.apache.org/jira/browse/CARBONDATA-2967

Thanks
Kunal Kapoor

On Mon, Sep 24, 2018 at 12:56 PM aaron <[hidden email]> wrote:

> Hi kunalkapoor,
>
> More info for you.
>
> *1. One comment about how to reproduce this *- query was distributed to
> spark workers on different nodes for execution.
>
> *2. Detailed stacktrace*
>
> scala> carbon.time(carbon.sql(
>      |       s"""SELECT sum(est_free_app_download), timeseries(date,
> 'MONTH'), country_code
>      |          |FROM store WHERE market_code='apple-store' and
> device_code='ios-phone' and country_code IN ('US', 'CN')
>      |          |GROUP BY timeseries(date, 'MONTH'), market_code,
> device_code, country_code,
> category_id""".stripMargin).show(truncate=false))
> 18/09/23 23:42:42 AUDIT CacheProvider:
> [ec2-dca-aa-p-sdn-16.appannie.org][hadoop][Thread-1]The key
> carbon.query.directQueryOnDataMap.enabled with value true added in the
> session param
> [Stage 0:>                                                          (0 + 2)
> / 2]18/09/23 23:42:46 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID
> 1, 10.2.3.19, executor 1): java.lang.RuntimeException: Error while
> resolving
> filter expression
>         at
>
> org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1043)
>         at
>
> org.apache.carbondata.core.scan.model.QueryModelBuilder.build(QueryModelBuilder.java:322)
>         at
>
> org.apache.carbondata.hadoop.api.CarbonInputFormat.createQueryModel(CarbonInputFormat.java:632)
>         at
>
> org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:419)
>         at
> org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>         at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>         at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>         at org.apache.spark.scheduler.Task.run(Task.scala:109)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>         at
>
> org.apache.carbondata.core.scan.executor.util.QueryUtil.getTableIdentifierForColumn(QueryUtil.java:401)
>         at
>
> org.apache.carbondata.core.scan.filter.FilterUtil.getForwardDictionaryCache(FilterUtil.java:1416)
>         at
>
> org.apache.carbondata.core.scan.filter.FilterUtil.getFilterValues(FilterUtil.java:712)
>         at
>
> org.apache.carbondata.core.scan.filter.resolver.resolverinfo.visitor.DictionaryColumnVisitor.populateFilterResolvedInfo(DictionaryColumnVisitor.java:60)
>         at
>
> org.apache.carbondata.core.scan.filter.resolver.resolverinfo.DimColumnResolvedFilterInfo.populateFilterInfoBasedOnColumnType(DimColumnResolvedFilterInfo.java:119)
>         at
>
> org.apache.carbondata.core.scan.filter.resolver.ConditionalFilterResolverImpl.resolve(ConditionalFilterResolverImpl.java:107)
>         at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:255)
>         at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
>         at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
>         at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
>         at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
>         at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolvertree(FilterExpressionProcessor.java:235)
>         at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolver(FilterExpressionProcessor.java:84)
>         at
>
> org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1041)
>         ... 19 more
>
> 18/09/23 23:42:48 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times;
> aborting job
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0
> (TID 7, 10.2.3.19, executor 1): java.lang.RuntimeException: Error while
> resolving filter expression
>         at
>
> org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1043)
>         at
>
> org.apache.carbondata.core.scan.model.QueryModelBuilder.build(QueryModelBuilder.java:322)
>         at
>
> org.apache.carbondata.hadoop.api.CarbonInputFormat.createQueryModel(CarbonInputFormat.java:632)
>         at
>
> org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:419)
>         at
> org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>         at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>         at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>         at org.apache.spark.scheduler.Task.run(Task.scala:109)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>         at
>
> org.apache.carbondata.core.scan.executor.util.QueryUtil.getTableIdentifierForColumn(QueryUtil.java:401)
>         at
>
> org.apache.carbondata.core.scan.filter.FilterUtil.getForwardDictionaryCache(FilterUtil.java:1416)
>         at
>
> org.apache.carbondata.core.scan.filter.FilterUtil.getFilterValues(FilterUtil.java:712)
>         at
>
> org.apache.carbondata.core.scan.filter.resolver.resolverinfo.visitor.DictionaryColumnVisitor.populateFilterResolvedInfo(DictionaryColumnVisitor.java:60)
>         at
>
> org.apache.carbondata.core.scan.filter.resolver.resolverinfo.DimColumnResolvedFilterInfo.populateFilterInfoBasedOnColumnType(DimColumnResolvedFilterInfo.java:119)
>         at
>
> org.apache.carbondata.core.scan.filter.resolver.ConditionalFilterResolverImpl.resolve(ConditionalFilterResolverImpl.java:107)
>         at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:255)
>         at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
>         at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
>         at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
>         at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
>         at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolvertree(FilterExpressionProcessor.java:235)
>         at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolver(FilterExpressionProcessor.java:84)
>         at
>
> org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1041)
>         ... 19 more
>
> Driver stacktrace:
>   at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1533)
>   at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1521)
>   at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1520)
>   at
>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1520)
>   at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
>   at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
>   at scala.Option.foreach(Option.scala:257)
>   at
>
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
>   at
>
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1748)
>   at
>
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1703)
>   at
>
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1692)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>   at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2029)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2050)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2069)
>   at
> org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:336)
>   at
>
> org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
>   at
> org.apache.spark.sql.Dataset.org
> $apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:2865)
>   at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2154)
>   at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2154)
>   at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:2846)
>   at
>
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2845)
>   at org.apache.spark.sql.Dataset.head(Dataset.scala:2154)
>   at org.apache.spark.sql.Dataset.take(Dataset.scala:2367)
>   at org.apache.spark.sql.Dataset.showString(Dataset.scala:241)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:643)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:620)
>   at $anonfun$1.apply$mcV$sp(<console>:37)
>   at $anonfun$1.apply(<console>:37)
>   at $anonfun$1.apply(<console>:37)
>   at org.apache.spark.sql.SparkSession.time(SparkSession.scala:667)
>   ... 51 elided
> Caused by: java.lang.RuntimeException: Error while resolving filter
> expression
>   at
>
> org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1043)
>   at
>
> org.apache.carbondata.core.scan.model.QueryModelBuilder.build(QueryModelBuilder.java:322)
>   at
>
> org.apache.carbondata.hadoop.api.CarbonInputFormat.createQueryModel(CarbonInputFormat.java:632)
>   at
>
> org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:419)
>   at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>   at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>   at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>   at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>   at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>   at org.apache.spark.scheduler.Task.run(Task.scala:109)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
>   at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>   at
>
> org.apache.carbondata.core.scan.executor.util.QueryUtil.getTableIdentifierForColumn(QueryUtil.java:401)
>   at
>
> org.apache.carbondata.core.scan.filter.FilterUtil.getForwardDictionaryCache(FilterUtil.java:1416)
>   at
>
> org.apache.carbondata.core.scan.filter.FilterUtil.getFilterValues(FilterUtil.java:712)
>   at
>
> org.apache.carbondata.core.scan.filter.resolver.resolverinfo.visitor.DictionaryColumnVisitor.populateFilterResolvedInfo(DictionaryColumnVisitor.java:60)
>   at
>
> org.apache.carbondata.core.scan.filter.resolver.resolverinfo.DimColumnResolvedFilterInfo.populateFilterInfoBasedOnColumnType(DimColumnResolvedFilterInfo.java:119)
>   at
>
> org.apache.carbondata.core.scan.filter.resolver.ConditionalFilterResolverImpl.resolve(ConditionalFilterResolverImpl.java:107)
>   at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:255)
>   at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
>   at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
>   at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
>   at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
>   at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolvertree(FilterExpressionProcessor.java:235)
>   at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolver(FilterExpressionProcessor.java:84)
>   at
>
> org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1041)
>   ... 19 more
>
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: Issues about dictionary and S3

aaron
Hi kunalkapoor,

Thanks very much for your quick response!

1. For the global dictionary issue, Do you have rough plan about the fix?
2. How's the local dictionary bug on spark 2.3.1?

Looking forward to the fix!
 
Thanks
Aaron



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Issues about dictionary and S3

kunalkapoor
Hi aaron,
1. I have already started working on the issue. It requires some
refactoring as well.
2. I am trying to reproduce this issue. Will update you soon


Thanks

On Tue, Sep 25, 2018 at 3:55 PM aaron <[hidden email]> wrote:

> Hi kunalkapoor,
>
> Thanks very much for your quick response!
>
> 1. For the global dictionary issue, Do you have rough plan about the fix?
> 2. How's the local dictionary bug on spark 2.3.1?
>
> Looking forward to the fix!
>
> Thanks
> Aaron
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: Issues about dictionary and S3

aaron
Thanks a lot! Looking forward to your good news.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Issues about dictionary and S3

kunalkapoor
Hi aaron,
For Issue2 can you cherry-pick
https://github.com/apache/carbondata/pull/2761 and try.
I think it should solve your problem.

Thanks

On Tue, Sep 25, 2018 at 8:34 PM aaron <[hidden email]> wrote:

> Thanks a lot! Looking forward to your good news.
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: Issues about dictionary and S3

aaron
Thanks, I've check already and it works well!  Very impressive quick response
!



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Issues about dictionary and S3

kunalkapoor
Hi aaron
I have raised a PR for issue1
Can you cherry-pick the below commit and try!!!

https://github.com/apache/carbondata/pull/2786

Thanks
Kunal Kapoor

On Wed, Sep 26, 2018, 8:31 PM aaron <[hidden email]> wrote:

> Thanks, I've check already and it works well!  Very impressive quick
> response
> !
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: Issues about dictionary and S3

aaron
Wow, cool!  I will have a try!



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Issues about dictionary and S3

aaron
In reply to this post by kunalkapoor