This post was updated on .
Hi community,
Currently, i am working on supporting SI with complex array type. In order to support it, we must decide, how we can store Array type in SI, to get better performance. Solution 1: Store Array as complex(ARRAY) type in secondary index table. Cons: Pruning arrays of huge data on SI and maintable will be an overhead and might not give much performance results. Solution 2: Make Array data as flattened and store it as its child DataType in secondary index table, which can provide benefit in some scenarios, compared to solution 1.(i have raised a PR with this solution). On first level, only one level of Array will be supported. And also, with this solution, added support to prune SI on rowId(keeping position id till rowId,instead of blockletId), with complex types for better performance. Cons: With this solution, query having more than one array_contains filter with expressions like AND, cannot be supported on SI, since the array data will flattened in SI. Inputs and suggestions for any new solution/ changes in above solution are most welcomed. Regards, Indhumathi -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
+1 for solution2
Can we support more than one array_contains by using SI join (like SI on primitive data type)? ----- Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai |
Hi David & Indhumathi,
Storing Array of String as just String column in SI by flattening [with row level position reference] can result in slow performance in case of * Multiple array_contains() or multiple array[0] = 'x' * The join solution mentioned can result in multiple scan (once for every complex filter condition) which can slow down the SI performance. * Row level SI can slow down SI performance when the filter results huge value. * To support multiple SI on a single table, complex SI will become row level position reference and primitive will become blocklet level position reference. Need extra logic /time for join. * Solution 2 cannot support struct column SI in the future. So, it cannot be a generic solution. Considering the above points, *solution2 is a very good solution if only one filter exist* for complex column. *But not a good solution for all the scenarios.* *So, I have to go with solution1 or need to wait for other people opinions or new solutions.* Thanks, Ajantha On Thu, Jul 30, 2020 at 1:19 PM David CaiQiang <[hidden email]> wrote: > +1 for solution2 > > Can we support more than one array_contains by using SI join (like SI on > primitive data type)? > > > > ----- > Best Regards > David Cai > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
In reply to this post by Indhumathi
+1 for solution 2
Regards Kumar Vishal On Thu, 30 Jul 2020 at 3:19 PM, Indhumathi <[hidden email]> wrote: > Hi community, > > Currently, i am working on supporting SI with complex array type. > In order to support it, we must decide, how we can store Array type > in SI, to get better performance. > > Solution 1: > Store Array as complex(ARRAY) type in secondary index table. > > Cons: > Pruning arrays of huge data on SI and maintable will be an overhead > and might not give much performance results. > > Solution 2: > Make Array data as flattened and store it as its child DataType in > secondary > index table, which can provide benefit in some scenarios, compared to > solution 1. > (i have raised a PR with this solution). On first level, only one level of > Array > will be supported. > > And also, with this solution, added support to prune SI on rowId(keeping > position id > till rowId,instead of blockletId), with complex types for better > performance. > > Cons: > With this solution, query having more than one array_contains filter > with expressions like AND, cannot be supported on SI, since the array data > will > flattened in SI. > > Inputs and suggestions for any new solution/ changes in above solution are > most welcomed. > > Regards, > Indhumathi > > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ >
kumar vishal
|
Free forum by Nabble | Edit this page |