Apache CarbonData Dev Mailing List archive - Re: Re: Getting [Problem in loading segment blocks] error after doing multi update operations

Apache CarbonData Dev Mailing List archive

Re: Re: Getting [Problem in loading segment blocks] error after doing multi update operations

Posted by Liang Chen on Mar 23, 2018; 8:36am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Getting-Problem-in-loading-segment-blocks-error-after-doing-multi-update-operations-tp40249p42942.html

Hi

Already arrange to fix this issue, will raise the pull request asap. Thanks
for your feedback.

Regards
Liang

yixu2001 wrote

> dev
> This issue has caused great trouble for our production. I will appreciate
> if you have any plan to fix it and let me know.
>
>
> yixu2001
>
> From: BabuLal
> Date: 2018-03-23 00:20
> To: dev
> Subject: Re: Getting [Problem in loading segment blocks] error after doing
> multi update operations
> hi all
> i am able to reproduce same exception in my cluster and got the same
> exception. (Trace is listed below)
> ------
> scala> carbon.sql("select count(*) from public.c_compact4").show
> 2018-03-22 20:40:33,105 | WARN | main | main
> spark.sql.sources.options.keys
> expected, but read nothing |
> org.apache.carbondata.common.logging.impl.StandardLogService.logWarnMessage(StandardLogService.java:168)
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute,
> tree:
> Exchange SinglePartition
> +- *HashAggregate(keys=[], functions=[partial_count(1)],
> output=[count#1443L])
> +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :public,
> Table name :c_compact4, Schema
> :Some(StructType(StructField(id,StringType,true),
> StructField(qqnum,StringType,true), StructField(nick,StringType,true),
> StructField(age,StringType,true), StructField(gender,StringType,true),
> StructField(auth,StringType,true), StructField(qunnum,StringType,true),
> StructField(mvcc,StringType,true))) ] public.c_compact4[]
> at
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
> at
> org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:112)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at
> org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:235)
> at
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:141)
> at
> org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:372)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at
> org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:225)
> at
> org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:308)
> at
> org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:113)
> at
> org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2386)
> at
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
> at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788)
> at
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385)
> at
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2392)
> at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2128
> at
> org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2127)
> at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2818)
> at org.apache.spark.sql.Dataset.head(Dataset.scala:2127)
> at org.apache.spark.sql.Dataset.take(Dataset.scala:2342)
> at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
> at org.apache.spark.sql.Dataset.show(Dataset.scala:638)
> at org.apache.spark.sql.Dataset.show(Dataset.scala:597)
> at org.apache.spark.sql.Dataset.show(Dataset.scala:606)
> ... 48 elided
> Caused by: java.io.IOException: Problem in loading segment blocks.
> at
> org.apache.carbondata.core.indexstore.BlockletDataMapIndexStore.getAll(BlockletDataMapIndexStore.java:153)
> at
> org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory.getDataMaps(BlockletDataMapFactory.java:76)
> at
> org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:72)
> at
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getDataBlocksOfSegment(CarbonTableInputFormat.java:739
> at
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:666)
> at
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:426)
> at
> org.apache.carbondata.spark.rdd.CarbonScanRDD.getPartitions(CarbonScanRDD.scala:107)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
> at
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
> at scala.Option.getOrElse(Option.scala:121
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:251
> at
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
> at org.apache.spark.ShuffleDependency.
> <init>
> (Dependency.scala:91)
> at
> org.apache.spark.sql.execution.exchange.ShuffleExchange$.prepareShuffleDependency(ShuffleExchange.scala:273)
> at
> org.apache.spark.sql.execution.exchange.ShuffleExchange.prepareShuffleDependency(ShuffleExchange.scala:84)
> at
> org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:121)
> at
> org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:112)
> at
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
> ... 81 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
> at
> org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getLocations(AbstractDFSCarbonFile.java:509)
> at
> org.apache.carbondata.core.indexstore.BlockletDataMapIndexStore.getAll(BlockletDataMapIndexStore.java:142)
>
> ----------------Store location---- ----
> linux-49:/opt/babu # hadoop fs -ls
> /user/hive/warehouse/carbon.store/public/c_compact4/Fact/Part0/Segment_0/*.deletedelta
> -rw-rw-r--+ 3 hdfs hive 177216 2018-03-22 18:20
> /user/hive/warehouse/carbon.store/public/c_compact4/Fact/Part0/Segment_0/part-0-0_batchno0-0-1521723019528.deletedelta
> -rw-r--r-- 3 hdfs hive 0 2018-03-22 19:35
> /user/hive/warehouse/carbon.store/public/c_compact4/Fact/Part0/Segment_0/part-0-0_batchno0-0-1521723886214.deletedelta
> -rw-rw-r--+ 3 hdfs hive 87989 2018-03-22 18:20
> /user/hive/warehouse/carbon.store/public/c_compact4/Fact/Part0/Segment_0/part-0-1_batchno0-0-1521723019528.deletedelta
> -rw-r--r-- 3 hdfs hive 0 2018-03-22 19:35
> /user/hive/warehouse/carbon.store/public/c_compact4/Fact/Part0/Segment_0/part-0-1_batchno0-0-1521723886214.deletedelta
> -rw-rw-r--+ 3 hdfs hive 87989 2018-03-22 18:20
> /user/hive/warehouse/carbon.store/public/c_compact4/Fact/Part0/Segment_0/part-0-2_batchno0-0-1521723019528.deletedelta
> -rw-r--r-- 3 hdfs hive 0 2018-03-22 19:35
> /user/hive/warehouse/carbon.store/public/c_compact4/Fact/Part0/Segment_0/part-0-2_batchno0-0-1521723886214.deletedelta
>
> -----------------------------------------------------------
>
> Issue reproduced technique :-
> Writing content of delete delta is failed but deletedelta file created
> successfully . Failed during Horizontal Compaction ( added setSpaceQuota
> in hdfs so that file can created successfully and write to this file is
> failed)
> *Below points to be handled to fix this issue.*
>
> 1. When Horizontal Compaction is failed 0 byte delete delta file should be
> deleted currently it is not deleted. This is a cleaning part of the
> Horizontal Compaction fail .
> 2. delete delta of 0 byte should not be considered while reading .( we can
> further discuss about this solution ) . currently tablestatus file has the
> entry of deletedelta timestamp.
> 3. If deleting is in progress , file is created (name node has entry of
> file) but data writing is in progress (not yet flush) but at same time
> select query is triggered ,then Query will failed so this scenario also
> need to handle.
>
> @dev :- Please Let me know if any other detail is needed.
>
> Thanks
> Babu
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/