[DISCUSSION] Geo spatial index algorithm improvement and UDFs enhancement

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSSION] Geo spatial index algorithm improvement and UDFs enhancement

haomarch
Hi Community,

Now carbondata supports geo spatial index and one query UDF 'InPolygon'.
We plan to optimize the Spatial index feature with three points:

1 reduce the parameters of table properties when creating geo table;
2 add more UDFs and support more complex query scenario;
3 allow user to define the spatial index when 'LOAD' and 'INSERT INTO', and
carbon will still generated the value of spatial index column internally
when user does not give.


I have added an initial v1 design document 'CarbonData Spatial Index Design
Doc.docx' and UDF interface design document 'Carbon Geo UDF Enhancement
Interface Design.docx', please check and give comments/inputs/suggestions.

CarbonData_Spatial_Index_Design_Doc.docx
<http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t431/CarbonData_Spatial_Index_Design_Doc.docx>  
Carbon_Geo_UDF_Enhancement_Interface_Design.docx
<http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t431/Carbon_Geo_UDF_Enhancement_Interface_Design.docx>  

Thanks,

Regards,
Shen Jiayu



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Geo spatial index algorithm improvement and UDFs enhancement

Ajantha Bhat
Hi Shen Jiayu,
It is an interesting feature, thanks for proposing this.

+1 from my side for high-level design,

I have few suggestions and questions.
a) Better to separate new UDF, utility UDF PR from algorithm improvement PR
for ease of review and maintainability.
b) Union, intersection, and diff of polygons can be computed during the
filter expression creation and can send the final polygon coordinates as
one range filter to carbon.
c) About algorithm improvement, I saw that you have removed a few
parameters like ‘minLongitude’, ‘maxLongitude’, ‘minLatitude’,
‘maxLatitude’. Anything else changed, can you describe more about what kind
of changes done to improve the algorithm?
d) Please capture the performance results due to algorithm changes with and
without these changes.
e) You have also mentioned supporting Geohash column from user during load.
This case no need to configure any spatial index properties in table
properties right ?

Thanks,
Ajantha

On Mon, Dec 14, 2020 at 9:18 PM haomarch <[hidden email]> wrote:

> Hi Community,
>
> Now carbondata supports geo spatial index and one query UDF 'InPolygon'.
> We plan to optimize the Spatial index feature with three points:
>
> 1 reduce the parameters of table properties when creating geo table;
> 2 add more UDFs and support more complex query scenario;
> 3 allow user to define the spatial index when 'LOAD' and 'INSERT INTO', and
> carbon will still generated the value of spatial index column internally
> when user does not give.
>
>
> I have added an initial v1 design document 'CarbonData Spatial Index Design
> Doc.docx' and UDF interface design document 'Carbon Geo UDF Enhancement
> Interface Design.docx', please check and give comments/inputs/suggestions.
>
> CarbonData_Spatial_Index_Design_Doc.docx
> <
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t431/CarbonData_Spatial_Index_Design_Doc.docx>
>
> Carbon_Geo_UDF_Enhancement_Interface_Design.docx
> <
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t431/Carbon_Geo_UDF_Enhancement_Interface_Design.docx>
>
>
> Thanks,
>
> Regards,
> Shen Jiayu
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Geo spatial index algorithm improvement and UDFs enhancement

David CaiQiang
+1



-----
Best Regards
David Cai
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] Geo spatial index algorithm improvement and UDFs enhancement

VenuReddy
In reply to this post by haomarch
+1

Good to see Geo enhancements.

Just few points/queries :
1. Util UDFs seem to take origin latitude and grid size as argments as well.
Shall we inherit them from table specified in the query during query
processing? Probably can avoid invalid/inconsistent origin latitude and grid
size values given as UDF arguments(i.e., not same values as in
tableproperties).
2. Regarding the point - *"Allowing flexibility to user to specify the
spatial index value when 'LOAD' and 'INSERT INTO' without generating it
implicitly based on the configured table properties(i.e., grid size, origin
latitude etc)"*
Wouldn't the query results vary when user configured spatial indexes are
different than that of generated ones ? Also when insert into target_table
select * from source_table, both source_table and target_table may not
necessarily have the same geo specific tableproperties. May need to generate
the spatial index value based on the target_table properties.
3. After this algorithm improvements, polygon query result match with the
current version ? If no, suggest to capture the difference in the doc.

Regards,
Venu



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/