[jira] [Updated] (CARBONDATA-2928) query failed when doing merge index during load

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (CARBONDATA-2928) query failed when doing merge index during load

Akash R Nilugal (Jira)

     [ https://issues.apache.org/jira/browse/CARBONDATA-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ocean updated CARBONDATA-2928:
------------------------------
    Description:
In carbondata version 1.4.1, carbonindex file be merged in every load. But when query through thriftserver(about 10QPS), if merge index is in progress, An error will occurs.

 

java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException: File does not exist: /warehouse/spark/ae_event_cb_40e_std/productid=534/part-0-9965100033100001_batchno0-0-9865-1536751682754.carbondata
 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
 at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:587)
 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
 at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

at org.apache.hadoop.ipc.Client.call(Client.java:1475)
 at org.apache.hadoop.ipc.Client.call(Client.java:1412)

 

  was:
In carbondata version 1.4.1, carbonindex file be merged in every load. But when query through thriftserver(about 10QPS), if merge index is in progress, An error will occurs.

18/09/12 11:18:25 ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING,
 org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
 Exchange SinglePartition
 +- *HashAggregate(keys=[], functions=[partial_count(1)], output=[count#1692258L|#1692258L])
 +- *Project
 +- *FileScan carbondata default.ae_event_cb_40e_std[] PushedFilters: [IsNotNull(eventid), IsNotNull(productid), IsNotNull(starttime_day), EqualTo(productid,534), Equa...

at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
 at org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:115)
 at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
 at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
 at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:252)
 at org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:141)
 at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:386)
 at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
 at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
 at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:228)
 at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:275)
 at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:2861)
 at org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2387)
 at org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2387)
 at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:2842)
 at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
 at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2841)
 at org.apache.spark.sql.Dataset.collect(Dataset.scala:2387)
 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:245)
 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:174)
 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:184)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: Problem in loading segment blocks.
 at org.apache.carbondata.core.indexstore.BlockletDataMapIndexStore.getAll(BlockletDataMapIndexStore.java:184)
 at org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory.getDataMaps(BlockletDataMapFactory.java:144)
 at org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:93)
 at org.apache.carbondata.core.datamap.dev.expr.DataMapExprWrapperImpl.prune(DataMapExprWrapperImpl.java:53)
 at org.apache.carbondata.hadoop.api.CarbonInputFormat.getPrunedBlocklets(CarbonInputFormat.java:442)
 at org.apache.carbondata.hadoop.api.CarbonInputFormat.getDataBlocksOfSegment(CarbonInputFormat.java:378)
 at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:536)
 at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:223)
 at org.apache.carbondata.spark.rdd.CarbonScanRDD.getPartitions(CarbonScanRDD.scala:122)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
 at scala.Option.getOrElse(Option.scala:121)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
 at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
 at scala.Option.getOrElse(Option.scala:121)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
 at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
 at scala.Option.getOrElse(Option.scala:121)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
 at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:91)
 at org.apache.spark.sql.execution.exchange.ShuffleExchange$.prepareShuffleDependency(ShuffleExchange.scala:264)
 at org.apache.spark.sql.execution.exchange.ShuffleExchange.prepareShuffleDependency(ShuffleExchange.scala:87)
 at org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:124)
 at org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:115)
 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
 ... 37 more
 Caused by: java.lang.RuntimeException
 at org.apache.carbondata.core.indexstore.BlockletDataMapIndexStore.get(BlockletDataMapIndexStore.java:143)
 at org.apache.carbondata.core.indexstore.BlockletDataMapIndexStore.getAll(BlockletDataMapIndexStore.java:173)
 ... 65 more
 Caused by: java.lang.NullPointerException


> query failed when doing merge index during load
> -----------------------------------------------
>
>                 Key: CARBONDATA-2928
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2928
>             Project: CarbonData
>          Issue Type: Bug
>          Components: data-load
>    Affects Versions: 1.4.1
>            Reporter: ocean
>            Priority: Major
>             Fix For: NONE
>
>
> In carbondata version 1.4.1, carbonindex file be merged in every load. But when query through thriftserver(about 10QPS), if merge index is in progress, An error will occurs.
>  
> java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException: File does not exist: /warehouse/spark/ae_event_cb_40e_std/productid=534/part-0-9965100033100001_batchno0-0-9865-1536751682754.carbondata
>  at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
>  at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
>  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
>  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>  at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:587)
>  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>  at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> at org.apache.hadoop.ipc.Client.call(Client.java:1475)
>  at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)