Hi Community,
Now carbondata supports geo spatial index and one query UDF 'InPolygon'. We plan to optimize the Spatial index feature with three points: 1 reduce the parameters of table properties when creating geo table; 2 add more UDFs and support more complex query scenario; 3 allow user to define the spatial index when 'LOAD' and 'INSERT INTO', and carbon will still generated the value of spatial index column internally when user does not give. I have added an initial v1 design document 'CarbonData Spatial Index Design Doc.docx' and UDF interface design document 'Carbon Geo UDF Enhancement Interface Design.docx', please check and give comments/inputs/suggestions. CarbonData_Spatial_Index_Design_Doc.docx <http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t431/CarbonData_Spatial_Index_Design_Doc.docx> Carbon_Geo_UDF_Enhancement_Interface_Design.docx <http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t431/Carbon_Geo_UDF_Enhancement_Interface_Design.docx> Thanks, Regards, Shen Jiayu -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Hi Shen Jiayu,
It is an interesting feature, thanks for proposing this. +1 from my side for high-level design, I have few suggestions and questions. a) Better to separate new UDF, utility UDF PR from algorithm improvement PR for ease of review and maintainability. b) Union, intersection, and diff of polygons can be computed during the filter expression creation and can send the final polygon coordinates as one range filter to carbon. c) About algorithm improvement, I saw that you have removed a few parameters like ‘minLongitude’, ‘maxLongitude’, ‘minLatitude’, ‘maxLatitude’. Anything else changed, can you describe more about what kind of changes done to improve the algorithm? d) Please capture the performance results due to algorithm changes with and without these changes. e) You have also mentioned supporting Geohash column from user during load. This case no need to configure any spatial index properties in table properties right ? Thanks, Ajantha On Mon, Dec 14, 2020 at 9:18 PM haomarch <[hidden email]> wrote: > Hi Community, > > Now carbondata supports geo spatial index and one query UDF 'InPolygon'. > We plan to optimize the Spatial index feature with three points: > > 1 reduce the parameters of table properties when creating geo table; > 2 add more UDFs and support more complex query scenario; > 3 allow user to define the spatial index when 'LOAD' and 'INSERT INTO', and > carbon will still generated the value of spatial index column internally > when user does not give. > > > I have added an initial v1 design document 'CarbonData Spatial Index Design > Doc.docx' and UDF interface design document 'Carbon Geo UDF Enhancement > Interface Design.docx', please check and give comments/inputs/suggestions. > > CarbonData_Spatial_Index_Design_Doc.docx > < > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t431/CarbonData_Spatial_Index_Design_Doc.docx> > > Carbon_Geo_UDF_Enhancement_Interface_Design.docx > < > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t431/Carbon_Geo_UDF_Enhancement_Interface_Design.docx> > > > Thanks, > > Regards, > Shen Jiayu > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
+1
----- Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai |
In reply to this post by haomarch
+1
Good to see Geo enhancements. Just few points/queries : 1. Util UDFs seem to take origin latitude and grid size as argments as well. Shall we inherit them from table specified in the query during query processing? Probably can avoid invalid/inconsistent origin latitude and grid size values given as UDF arguments(i.e., not same values as in tableproperties). 2. Regarding the point - *"Allowing flexibility to user to specify the spatial index value when 'LOAD' and 'INSERT INTO' without generating it implicitly based on the configured table properties(i.e., grid size, origin latitude etc)"* Wouldn't the query results vary when user configured spatial indexes are different than that of generated ones ? Also when insert into target_table select * from source_table, both source_table and target_table may not necessarily have the same geo specific tableproperties. May need to generate the spatial index value based on the target_table properties. 3. After this algorithm improvements, polygon query result match with the current version ? If no, suggest to capture the difference in the doc. Regards, Venu -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Free forum by Nabble | Edit this page |