XiaoWen created CARBONDATA-3905:
----------------------------------- Summary: When there are many segment files presto query fail Key: CARBONDATA-3905 URL: https://issues.apache.org/jira/browse/CARBONDATA-3905 Project: CarbonData Issue Type: Bug Components: presto-integration Affects Versions: 2.0.0 Reporter: XiaoWen test case1 insert data in: df.writeStream.foreachBatch{ (batchDF: DataFrame, batchId: Long) => { ... val cond = $"B.id".isin(df.select(col = "id").as[Int].collect: _*) target.as("A") .merge(df.as("B"), "A.id = B.id") .whenMatched(cond) .updateExpr(Map("name" -> "B.name", "city" -> "B.city", "age" -> "B.age")) .whenNotMatched(cond) .insertExpr(Map("id" -> "B.id", "name" -> "B.name", "city" -> "B.city", "age" -> "B.age")) .execute() ... }).outputMode("update").trigger(Trigger.ProcessingTime("3600 seconds")).start() a lot of segment files will be generated after a few hours when i try to use presto to query. single condition can be queried, but cannot be queried when there are multiple conditions. select name from test_table // ok select name from test_table where name = 'joe' // ok select name from test_table where name='joe' AND age > 25;// query failed select name from test_table where name='joe' AND age > 25 AND city ='shenzhen';// query failed i have also tried to compact 'major' the segment files to reduce the segment quantity, and I still cannot query successfully. presto server logs java.lang.IllegalArgumentException: Invalid position 0 in block with 0 positions at io.prestosql.spi.block.BlockUtil.checkValidPosition(BlockUtil.java:62) at io.prestosql.spi.block.AbstractVariableWidthBlock.checkReadablePosition(AbstractVariableWidthBlock.java:160) at io.prestosql.spi.block.AbstractVariableWidthBlock.isNull(AbstractVariableWidthBlock.java:154) at io.prestosql.spi.block.LazyBlock.isNull(LazyBlock.java:248) at io.prestosql.$gen.PageFilter_20200703_084817_965.filter(Unknown Source) at io.prestosql.$gen.PageFilter_20200703_084817_965.filter(Unknown Source) at io.prestosql.operator.project.PageProcessor.createWorkProcessor(PageProcessor.java:115) at io.prestosql.operator.ScanFilterAndProjectOperator$SplitToPages.lambda$processPageSource$1(ScanFilterAndProjectOperator.java:254) at io.prestosql.operator.WorkProcessorUtils.lambda$flatMap$4(WorkProcessorUtils.java:246) at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:320) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221) at io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorUtils.lambda$flatten$6(WorkProcessorUtils.java:278) at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:320) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221) at io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221) at io.prestosql.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:215) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:133) at io.prestosql.operator.Driver.processInternal(Driver.java:379) at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283) at io.prestosql.operator.Driver.tryWithLock(Driver.java:675) at io.prestosql.operator.Driver.processFor(Driver.java:276) at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075) at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163) at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484) at io.prestosql.$gen.Presto_316____20200623_163219_1.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) test case2 when I import directly: INSERT OVERWRITE TABLE test_table SELECT * FROM other_table i found only a few segment files(about 3) select name from test_table // ok select name from test_table where name = 'joe' // ok select name from test_table where name='joe' AND age > 25;// ok select name from test_table where name='joe' AND age > 25 AND city ='shenzhen';// ok -- This message was sent by Atlassian Jira (v8.3.4#803005) |
Free forum by Nabble | Edit this page |