Apache CarbonData Dev Mailing List archive

Re: [Discussion]Presto Queries leveraging Secondary Index

Posted by kunalkapoor on Feb 25, 2021; 4:51am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Discussion-Presto-Queries-leveraging-Secondary-Index-tp105291p106456.html

+1 on using index server to leverage SI index. As discussed earlier we
would need a segment UDF to enable selective segment reading instead of the
current implementation. The existing setSegmentsToRead API should be
removed later as well

Please share the design after your POC

On Mon, Jan 18, 2021 at 9:42 AM akashrn5 <[hidden email]> wrote:

> Hi venu,
>
> Thanks for suggesting.
>
> 1. option 1 is not a good idea. i think performance will be bad
> 2. for option2, like we have other indexes of lucene and bloom where the
> distributed pruning happens. Lucene also a index stored along with table,
> but not another table like SI, so we scan lucene in a distributed job and
> then return the index for the filter expression. So similarly we can call
> for SI to scan and prune, but since we need spark job to do it, we need
> indexserver which is the only option.
> So we can use that for scanning, but im afraid if it impacts the other
> concurrent queries, so i would suggest better to go for POC with the index
> server where we will get to know some other bottlenecks with this approach,
> so then we can decide and start design.
>
> If you have already done POC and have some results and design is ready, we
> can review that.
>
> Thanks
>
> Regards
> Akash
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>