[jira] [Updated] (CARBONDATA-3905) When there are many segment files presto query fail

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (CARBONDATA-3905) When there are many segment files presto query fail

Akash R Nilugal (Jira)

     [ https://issues.apache.org/jira/browse/CARBONDATA-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

XiaoWen updated CARBONDATA-3905:
--------------------------------
    Description:
test case1

insert data in:
{code:java}
df.writeStream.foreachBatch{ (batchDF: DataFrame, batchId: Long) => {
    ...    
    val cond = $"B.id".isin(df.select(col = "id").as[Int].collect: _*)
    target.as("A")
      .merge(df.as("B"), "A.id = B.id")
      .whenMatched(cond)
      .updateExpr(Map("name" -> "B.name", "city" -> "B.city", "age" -> "B.age"))
      .whenNotMatched(cond)
      .insertExpr(Map("id" -> "B.id", "name" -> "B.name", "city" -> "B.city", "age" -> "B.age"))
      .execute()    
     ...
}).outputMode("update").trigger(Trigger.ProcessingTime("3600 seconds")).start()
{code}
a lot of segment files will be generated after a few hours
 when i try to use presto to query.
 single condition can be queried, but cannot be queried when there are multiple conditions.

select name from test_table // ok
 select name from test_table where name = 'joe' // ok
 select name from test_table where name='joe' AND age > 25;// query failed
 select name from test_table where name='joe' AND age > 25 AND city ='shenzhen';// query failed

i have also tried to compact 'major' the segment files to reduce the segment quantity, and I still cannot query successfully.

presto server logs

java.lang.IllegalArgumentException: Invalid position 0 in block with 0 positions
 at io.prestosql.spi.block.BlockUtil.checkValidPosition(BlockUtil.java:62)
 at io.prestosql.spi.block.AbstractVariableWidthBlock.checkReadablePosition(AbstractVariableWidthBlock.java:160)
 at io.prestosql.spi.block.AbstractVariableWidthBlock.isNull(AbstractVariableWidthBlock.java:154)
 at io.prestosql.spi.block.LazyBlock.isNull(LazyBlock.java:248)
 at io.prestosql.$gen.PageFilter_20200703_084817_965.filter(Unknown Source)
 at io.prestosql.$gen.PageFilter_20200703_084817_965.filter(Unknown Source)
 at io.prestosql.operator.project.PageProcessor.createWorkProcessor(PageProcessor.java:115)
 at io.prestosql.operator.ScanFilterAndProjectOperator$SplitToPages.lambda$processPageSource$1(ScanFilterAndProjectOperator.java:254)
 at io.prestosql.operator.WorkProcessorUtils.lambda$flatMap$4(WorkProcessorUtils.java:246)
 at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:320)
 at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
 at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307)
 at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
 at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307)
 at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
 at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
 at io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
 at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
 at io.prestosql.operator.WorkProcessorUtils.lambda$flatten$6(WorkProcessorUtils.java:278)
 at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:320)
 at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
 at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307)
 at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
 at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
 at io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
 at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
 at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
 at io.prestosql.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:215)
 at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
 at io.prestosql.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:133)
 at io.prestosql.operator.Driver.processInternal(Driver.java:379)
 at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
 at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
 at io.prestosql.operator.Driver.processFor(Driver.java:276)
 at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
 at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
 at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
 at io.prestosql.$gen.Presto_316____20200623_163219_1.run(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)

 

test case2

when I import directly:
 INSERT OVERWRITE TABLE test_table SELECT * FROM other_table

i found only a few segment files(about 3)

select name from test_table // ok
 select name from test_table where name = 'joe' // ok
 select name from test_table where name='joe' AND age > 25;// ok
 select name from test_table where name='joe' AND age > 25 AND city ='shenzhen';// ok

  was:
test case1

insert data in:
df.writeStream.foreachBatch{ (batchDF: DataFrame, batchId: Long) => {
 ...

val cond = $"B.id".isin(df.select(col = "id").as[Int].collect: _*)
 target.as("A")
 .merge(df.as("B"), "A.id = B.id")
 .whenMatched(cond)
 .updateExpr(Map("name" -> "B.name", "city" -> "B.city", "age" -> "B.age"))
 .whenNotMatched(cond)
 .insertExpr(Map("id" -> "B.id", "name" -> "B.name", "city" -> "B.city", "age" -> "B.age"))
 .execute()

...
}).outputMode("update").trigger(Trigger.ProcessingTime("3600 seconds")).start()

a lot of segment files will be generated after a few hours
when i try to use presto to query.
single condition can be queried, but cannot be queried when there are multiple conditions.

select name from test_table // ok
select name from test_table where name = 'joe' // ok
select name from test_table where name='joe' AND age > 25;// query failed
select name from test_table where name='joe' AND age > 25 AND city ='shenzhen';// query failed

i have also tried to compact 'major' the segment files to reduce the segment quantity, and I still cannot query successfully.


presto server logs

java.lang.IllegalArgumentException: Invalid position 0 in block with 0 positions
at io.prestosql.spi.block.BlockUtil.checkValidPosition(BlockUtil.java:62)
at io.prestosql.spi.block.AbstractVariableWidthBlock.checkReadablePosition(AbstractVariableWidthBlock.java:160)
at io.prestosql.spi.block.AbstractVariableWidthBlock.isNull(AbstractVariableWidthBlock.java:154)
at io.prestosql.spi.block.LazyBlock.isNull(LazyBlock.java:248)
at io.prestosql.$gen.PageFilter_20200703_084817_965.filter(Unknown Source)
at io.prestosql.$gen.PageFilter_20200703_084817_965.filter(Unknown Source)
at io.prestosql.operator.project.PageProcessor.createWorkProcessor(PageProcessor.java:115)
at io.prestosql.operator.ScanFilterAndProjectOperator$SplitToPages.lambda$processPageSource$1(ScanFilterAndProjectOperator.java:254)
at io.prestosql.operator.WorkProcessorUtils.lambda$flatMap$4(WorkProcessorUtils.java:246)
at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:320)
at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307)
at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307)
at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
at io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
at io.prestosql.operator.WorkProcessorUtils.lambda$flatten$6(WorkProcessorUtils.java:278)
at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:320)
at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307)
at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
at io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
at io.prestosql.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:215)
at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
at io.prestosql.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:133)
at io.prestosql.operator.Driver.processInternal(Driver.java:379)
at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
at io.prestosql.operator.Driver.processFor(Driver.java:276)
at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
at io.prestosql.$gen.Presto_316____20200623_163219_1.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

 


test case2

when I import directly:
INSERT OVERWRITE TABLE test_table SELECT * FROM other_table

i found only a few segment files(about 3)

select name from test_table // ok
select name from test_table where name = 'joe' // ok
select name from test_table where name='joe' AND age > 25;// ok
select name from test_table where name='joe' AND age > 25 AND city ='shenzhen';// ok


> When there are many segment files presto query fail
> ---------------------------------------------------
>
>                 Key: CARBONDATA-3905
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3905
>             Project: CarbonData
>          Issue Type: Bug
>          Components: presto-integration
>    Affects Versions: 2.0.0
>            Reporter: XiaoWen
>            Priority: Major
>
> test case1
> insert data in:
> {code:java}
> df.writeStream.foreachBatch{ (batchDF: DataFrame, batchId: Long) => {
>     ...    
>     val cond = $"B.id".isin(df.select(col = "id").as[Int].collect: _*)
>     target.as("A")
>       .merge(df.as("B"), "A.id = B.id")
>       .whenMatched(cond)
>       .updateExpr(Map("name" -> "B.name", "city" -> "B.city", "age" -> "B.age"))
>       .whenNotMatched(cond)
>       .insertExpr(Map("id" -> "B.id", "name" -> "B.name", "city" -> "B.city", "age" -> "B.age"))
>       .execute()    
>      ...
> }).outputMode("update").trigger(Trigger.ProcessingTime("3600 seconds")).start()
> {code}
> a lot of segment files will be generated after a few hours
>  when i try to use presto to query.
>  single condition can be queried, but cannot be queried when there are multiple conditions.
> select name from test_table // ok
>  select name from test_table where name = 'joe' // ok
>  select name from test_table where name='joe' AND age > 25;// query failed
>  select name from test_table where name='joe' AND age > 25 AND city ='shenzhen';// query failed
> i have also tried to compact 'major' the segment files to reduce the segment quantity, and I still cannot query successfully.
> presto server logs
> java.lang.IllegalArgumentException: Invalid position 0 in block with 0 positions
>  at io.prestosql.spi.block.BlockUtil.checkValidPosition(BlockUtil.java:62)
>  at io.prestosql.spi.block.AbstractVariableWidthBlock.checkReadablePosition(AbstractVariableWidthBlock.java:160)
>  at io.prestosql.spi.block.AbstractVariableWidthBlock.isNull(AbstractVariableWidthBlock.java:154)
>  at io.prestosql.spi.block.LazyBlock.isNull(LazyBlock.java:248)
>  at io.prestosql.$gen.PageFilter_20200703_084817_965.filter(Unknown Source)
>  at io.prestosql.$gen.PageFilter_20200703_084817_965.filter(Unknown Source)
>  at io.prestosql.operator.project.PageProcessor.createWorkProcessor(PageProcessor.java:115)
>  at io.prestosql.operator.ScanFilterAndProjectOperator$SplitToPages.lambda$processPageSource$1(ScanFilterAndProjectOperator.java:254)
>  at io.prestosql.operator.WorkProcessorUtils.lambda$flatMap$4(WorkProcessorUtils.java:246)
>  at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:320)
>  at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
>  at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307)
>  at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
>  at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307)
>  at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
>  at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
>  at io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
>  at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
>  at io.prestosql.operator.WorkProcessorUtils.lambda$flatten$6(WorkProcessorUtils.java:278)
>  at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:320)
>  at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
>  at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307)
>  at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
>  at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
>  at io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
>  at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
>  at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
>  at io.prestosql.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:215)
>  at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
>  at io.prestosql.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:133)
>  at io.prestosql.operator.Driver.processInternal(Driver.java:379)
>  at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
>  at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
>  at io.prestosql.operator.Driver.processFor(Driver.java:276)
>  at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
>  at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
>  at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
>  at io.prestosql.$gen.Presto_316____20200623_163219_1.run(Unknown Source)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> test case2
> when I import directly:
>  INSERT OVERWRITE TABLE test_table SELECT * FROM other_table
> i found only a few segment files(about 3)
> select name from test_table // ok
>  select name from test_table where name = 'joe' // ok
>  select name from test_table where name='joe' AND age > 25;// ok
>  select name from test_table where name='joe' AND age > 25 AND city ='shenzhen';// ok



--
This message was sent by Atlassian Jira
(v8.3.4#803005)