Apache CarbonData Dev Mailing List archive

Re: [DISCUSSION] Support JOIN query with spatial index

Posted by Indhumathi on Mar 30, 2021; 12:55pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/DISCUSSION-Support-JOIN-query-with-spatial-index-tp107159p107289.html

Hello all,

Current design is based on Union of polygons identified from
polygon table.

Here, (Latitude and Longitude) pair is present in spatial table
and polygons define a dimension table.

Based on discussion with customer, need to change the design to
support IN_POLYGON_JOIN in below way.

Apply IN_POLYGON udf on each polygon identified from Polygon table
and apply aggregation/group by each polygon result.

For example:
Select sum(t1.col1),t2.polygonId
from table1 t1
inner join
(select polygon,polygonId from table2 where type='x') t2
on in_polygon_join(t1, t2.polygon)
group by t2.polygonId

table1:
------------+-------------------
Col1 + mygeohash |
------------+------------------
1 |01 |
2 |02 |
3 |03 |
4 |04 |
5 |05 |
6 |06 |
------------+------------------

table2:
-----------------+------------------+--------------
polygon + type | polygonId |
-----------------+------------------+--------------
1_Polygon() |x |1 |
2_Polygon() |y |2 |
3_Polygon() |x |3 |
4_Polygon() |r |4 |
------------+---------------------+----------------

If 1_polygon lies in range (0 & 1), 3_polygon in (5 & 6),
result could be like,
-----------------+--------------------
sum(t1.col1 + polygonId |
-----------------+--------------------
3 + 1 |
11 +3 |
-----------------+-------------------

To achieve this, a solution could be to, run query with IN_POLYGON
udf of each polygon and finally make a union of query result.

Will update design on further analysis.
Any opinion or suggestions are welcomed.

Thanks,
Indhumathi M

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/