[jira] [Assigned] (CARBONDATA-1032) NumberFormatException and NegativeArraySizeException for select with in clause filter limit for unsafe true configuration

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Assigned] (CARBONDATA-1032) NumberFormatException and NegativeArraySizeException for select with in clause filter limit for unsafe true configuration

Akash R Nilugal (Jira)

     [ https://issues.apache.org/jira/browse/CARBONDATA-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Srigopal Mohanty reassigned CARBONDATA-1032:
--------------------------------------------

    Assignee: Srigopal Mohanty

> NumberFormatException and NegativeArraySizeException for select with in clause filter limit for unsafe true configuration
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-1032
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-1032
>             Project: CarbonData
>          Issue Type: Bug
>          Components: data-query
>    Affects Versions: 1.1.0
>         Environment: 3 node cluster SUSE 11 SP4
>            Reporter: Chetan Bhat
>            Assignee: Srigopal Mohanty
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> Carbon .properties are configured as below:
> carbon.allowed.compaction.days = 2
> carbon.enable.auto.load.merge = false
> carbon.compaction.level.threshold = 3,2
> carbon.timestamp.format = yyyy-MM-dd
> carbon.badRecords.location = /tmp/carbon
> carbon.numberof.preserve.segments = 2
> carbon.sort.file.buffer.size = 20
> max.query.execution.time = 60
> carbon.number.of.cores.while.loading = 8
> carbon.storelocation =hdfs://hacluster/opt/CarbonStore
> enable.data.loading.statistics = true
> enable.unsafe.sort = true
> offheap.sort.chunk.size.inmb = 128
> sort.inmemory.size.inmb = 30720
> carbon.enable.vector.reader=true
> enable.unsafe.in.query.processing=true
> enable.query.statistics=true
> carbon.blockletgroup.size.in.mb=128
> high.cardinality.identify.enable=TRUE
> high.cardinality.threshold=10000
> high.cardinality.value=1000
> high.cardinality.row.count.percentage=40
> carbon.data.file.version=2
> carbon.major.compaction.size=2
> carbon.enable.auto.load.merge=FALSE
> carbon.numberof.preserve.segments=1
> carbon.allowed.compaction.days=1
> User creates table, loads 1535088 records data and executes the select with in clause filter limit.
> Actual Result :
> NumberFormatException and NegativeArraySizeException for select with in clause filter limit for unsafe true configuration.
> 0: jdbc:hive2://172.168.100.199:23040> select * from flow_carbon_test4 where opp_bk in ('1491999999158','1491999999116','1491999999022','1491999999031')  and dt>='20140101' and dt <= '20160101' order by bal asc limit 1000;
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 2109.0 failed 4 times, most recent failure: Lost task 1.3 in stage 2109.0 (TID 75120, linux-49, executor 2): java.lang.NegativeArraySizeException
>         at org.apache.carbondata.core.datastore.chunk.store.impl.unsafe.UnsafeBigDecimalMeasureChunkStore.getBigDecimal(UnsafeBigDecimalMeasureChunkStore.java:132)
>         at org.apache.carbondata.core.datastore.compression.decimal.CompressByteArray.getBigDecimalValue(CompressByteArray.java:94)
>         at org.apache.carbondata.core.datastore.dataholder.CarbonReadDataHolder.getReadableBigDecimalValueByIndex(CarbonReadDataHolder.java:38)
>         at org.apache.carbondata.core.scan.result.vector.MeasureDataVectorProcessor$DecimalMeasureVectorFiller.fillMeasureVectorForFilter(MeasureDataVectorProcessor.java:253)
>         at org.apache.carbondata.core.scan.result.impl.FilterQueryScannedResult.fillColumnarMeasureBatch(FilterQueryScannedResult.java:119)
>         at org.apache.carbondata.core.scan.collector.impl.DictionaryBasedVectorResultCollector.scanAndFillResult(DictionaryBasedVectorResultCollector.java:145)
>         at org.apache.carbondata.core.scan.collector.impl.DictionaryBasedVectorResultCollector.collectVectorBatch(DictionaryBasedVectorResultCollector.java:137)
>         at org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.processNextBatch(DataBlockIteratorImpl.java:65)
>         at org.apache.carbondata.core.scan.result.iterator.VectorDetailQueryResultIterator.processNextBatch(VectorDetailQueryResultIterator.java:46)
>         at org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextBatch(VectorizedCarbonRecordReader.java:251)
>         at org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextKeyValue(VectorizedCarbonRecordReader.java:141)
>         at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:221)
>         at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source)
>         at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
>         at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>         at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>         at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>         at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:628)
>         at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>         at org.apache.spark.sql.execution.TakeOrderedAndProjectExec$$anonfun$5.apply(limit.scala:148)
>         at org.apache.spark.sql.execution.TakeOrderedAndProjectExec$$anonfun$5.apply(limit.scala:147)
>         at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
>         at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
>         at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>         at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>         at org.apache.spark.scheduler.Task.run(Task.scala:99)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace: (state=,code=0)
> 0: jdbc:hive2://172.168.100.199:23040> select  *  from flow_carbon_test4 where  cus_ac  like '622262135067246539%'  and (txn_dte>='20150101' and txn_dte<='20160101') and txn_bk IN ('00000000000', '00000000001','00000000002') OR own_bk IN ('00000000424','00000001383','00000001942','00000001262') limit 1000;
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 131.0 failed 4 times, most recent failure: Lost task 0.3 in stage 131.0 (TID 240, linux-51, executor 1): java.lang.NumberFormatException: Zero length BigInteger
>         at java.math.BigInteger.<init>(BigInteger.java:293)
>         at org.apache.carbondata.core.util.DataTypeUtil.byteToBigDecimal(DataTypeUtil.java:189)
>         at org.apache.carbondata.core.datastore.chunk.store.impl.unsafe.UnsafeBigDecimalMeasureChunkStore.getBigDecimal(UnsafeBigDecimalMeasureChunkStore.java:136)
>         at org.apache.carbondata.core.datastore.compression.decimal.CompressByteArray.getBigDecimalValue(CompressByteArray.java:94)
>         at org.apache.carbondata.core.datastore.dataholder.CarbonReadDataHolder.getReadableBigDecimalValueByIndex(CarbonReadDataHolder.java:38)
>         at org.apache.carbondata.core.scan.collector.impl.AbstractScannedResultCollector.getMeasureData(AbstractScannedResultCollector.java:104)
>         at org.apache.carbondata.core.scan.collector.impl.AbstractScannedResultCollector.fillMeasureData(AbstractScannedResultCollector.java:78)
>         at org.apache.carbondata.core.scan.collector.impl.DictionaryBasedResultCollector.fillMeasureData(DictionaryBasedResultCollector.java:158)
>         at org.apache.carbondata.core.scan.collector.impl.DictionaryBasedResultCollector.collectData(DictionaryBasedResultCollector.java:115)
>         at org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.next(DataBlockIteratorImpl.java:51)
>         at org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.next(DataBlockIteratorImpl.java:32)
>         at org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.getBatchResult(DetailQueryResultIterator.java:50)
>         at org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.next(DetailQueryResultIterator.java:41)
>         at org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.next(DetailQueryResultIterator.java:31)
>         at org.apache.carbondata.core.scan.result.iterator.ChunkRowIterator.<init>(ChunkRowIterator.java:41)
>         at org.apache.carbondata.hadoop.CarbonRecordReader.initialize(CarbonRecordReader.java:78)
>         at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:204)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>         at org.apache.spark.sql.CarbonDecoderRDD.compute(CarbonDictionaryDecoder.scala:538)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>         at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>         at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>         at org.apache.spark.scheduler.Task.run(Task.scala:99)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace: (state=,code=0)
> Expected Result : select with in clause filter limit for unsafe true configuration should execute successfully displaying correct result set without exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)