Posted by
Akash R Nilugal (Jira) on
Nov 26, 2020; 9:29am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/jira-Updated-CARBONDATA-4051-Geo-spatial-index-algorithm-improvement-and-UDFs-enhancement-tp103633.html
[
https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jiayu Shen updated CARBONDATA-4051:
-----------------------------------
Description:
The requirement is from SEQ,related algorithms are provided by Discovery Team.
1. Replace geohash encoded algorithm, and reduce required properties of CREATE TABLE. For example,
{code:java}
CREATE TABLE geoTable(
timevalue BIGINT,
longitude LONG,
latitude LONG) COMMENT "This is a GeoTable"
STORED AS carbondata
TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
'SPATIAL_INDEX.mygeohash.type'='geohash',
'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
'SPATIAL_INDEX.mygeohash.gridSize'='50',
'SPATIAL_INDEX.mygeohash.conversionRatio'='1000000'){code}
2. Add geo query UDFs
query filter UDFs :
* _*InPolygonList (List<String> polygonList, OperationType opType)*_
* _*InPolylineList (List<String> polylineList, Float bufferInMeter)*_
* _*InPolygonRangeList (List<Long []> RangeList, **OperationType opType**)*_
*operation only support :*
* *"OR", means calculating union of two polygons*
* *"AND", means calculating intersection of two polygons*
geo util UDFs :
* _*GeoIdToGridXy(Long geoId) :* *Pair<Integer, Integer>*_
* _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_
* _*GeoIdToLatLng(Long geoId) : Pair<Double, Double>*_
* _*ToUpperLayerGeoId(Long geoId) : Long*_
* _*ToRangeList (String polygon) : List<Long []>*_
3. Currently GeoID is a column created internally for spatial tables, this PR will support GeoID column to be customized during LOAD/INSERT INTO. For example,
{code:java}
INSERT INTO geoTable SELECT 0,1575428400000,116285807,40084087;
It uesed to be as below, '855280799612' is generated internally,
+------------+-------------+---------+--------+
|mygeohash |timevalue |longitude|latitude|
+------------+-------------+---------+--------+
|855280799612|1575428400000|116285807|40084087|
+------------+-------------+---------+--------+
but now is
+------------+-------------+---------+--------+
|mygeohash |timevalue |longitude|latitude|
+------------+-------------+---------+--------+
|0 |1575428400000|116285807|40084087|
+------------+-------------+---------+--------+{code}
was:
The requirement is from SEQ,related algorithms are provided by group Discovery.
1. Replace geohash encoded algorithm, and reduce required properties of CREATE TABLE. For example,
{code:java}
CREATE TABLE geoTable(
timevalue BIGINT,
longitude LONG,
latitude LONG) COMMENT "This is a GeoTable"
STORED AS carbondata
TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
'SPATIAL_INDEX.mygeohash.type'='geohash',
'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
'SPATIAL_INDEX.mygeohash.gridSize'='50',
'SPATIAL_INDEX.mygeohash.conversionRatio'='1000000'){code}
2. Add geo query UDFs
query filter UDFs :
* _*InPolygonList (List<String> polygonList, OperationType opType)*_
* _*InPolylineList (List<String> polylineList, Float bufferInMeter)*_
* _*InPolygonRangeList (List<Long []> RangeList, **OperationType opType**)*_
*operation only support :*
* *"OR", means calculating union of two polygons*
* *"AND", means calculating intersection of two polygons*
geo util UDFs :
* _*GeoIdToGridXy(Long geoId) :* *Pair<Integer, Integer>*_
* _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_
* _*GeoIdToLatLng(Long geoId) : Pair<Double, Double>*_
* _*ToUpperLayerGeoId(Long geoId) : Long*_
* _*ToRangeList (String polygon) : List<Long []>*_
3. Currently GeoID is a column created internally for spatial tables, this PR will support GeoID column to be customized during LOAD/INSERT INTO. For example,
{code:java}
INSERT INTO geoTable SELECT 0,1575428400000,116285807,40084087;
It uesed to be as below, '855280799612' is generated internally,
+------------+-------------+---------+--------+
|mygeohash |timevalue |longitude|latitude|
+------------+-------------+---------+--------+
|855280799612|1575428400000|116285807|40084087|
+------------+-------------+---------+--------+
but now is
+------------+-------------+---------+--------+
|mygeohash |timevalue |longitude|latitude|
+------------+-------------+---------+--------+
|0 |1575428400000|116285807|40084087|
+------------+-------------+---------+--------+{code}
> Geo spatial index algorithm improvement and UDFs enhancement
> ------------------------------------------------------------
>
> Key: CARBONDATA-4051
> URL:
https://issues.apache.org/jira/browse/CARBONDATA-4051> Project: CarbonData
> Issue Type: New Feature
> Reporter: Jiayu Shen
> Priority: Minor
> Attachments: CarbonData Spatial Index Design Doc v2.docx
>
> Time Spent: 4h 20m
> Remaining Estimate: 0h
>
> The requirement is from SEQ,related algorithms are provided by Discovery Team.
> 1. Replace geohash encoded algorithm, and reduce required properties of CREATE TABLE. For example,
> {code:java}
> CREATE TABLE geoTable(
> timevalue BIGINT,
> longitude LONG,
> latitude LONG) COMMENT "This is a GeoTable"
> STORED AS carbondata
> TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
> 'SPATIAL_INDEX.mygeohash.type'='geohash',
> 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
> 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
> 'SPATIAL_INDEX.mygeohash.gridSize'='50',
> 'SPATIAL_INDEX.mygeohash.conversionRatio'='1000000'){code}
> 2. Add geo query UDFs
> query filter UDFs :
> * _*InPolygonList (List<String> polygonList, OperationType opType)*_
> * _*InPolylineList (List<String> polylineList, Float bufferInMeter)*_
> * _*InPolygonRangeList (List<Long []> RangeList, **OperationType opType**)*_
> *operation only support :*
> * *"OR", means calculating union of two polygons*
> * *"AND", means calculating intersection of two polygons*
> geo util UDFs :
> * _*GeoIdToGridXy(Long geoId) :* *Pair<Integer, Integer>*_
> * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_
> * _*GeoIdToLatLng(Long geoId) : Pair<Double, Double>*_
> * _*ToUpperLayerGeoId(Long geoId) : Long*_
> * _*ToRangeList (String polygon) : List<Long []>*_
> 3. Currently GeoID is a column created internally for spatial tables, this PR will support GeoID column to be customized during LOAD/INSERT INTO. For example,
> {code:java}
> INSERT INTO geoTable SELECT 0,1575428400000,116285807,40084087;
> It uesed to be as below, '855280799612' is generated internally,
> +------------+-------------+---------+--------+
> |mygeohash |timevalue |longitude|latitude|
> +------------+-------------+---------+--------+
> |855280799612|1575428400000|116285807|40084087|
> +------------+-------------+---------+--------+
> but now is
> +------------+-------------+---------+--------+
> |mygeohash |timevalue |longitude|latitude|
> +------------+-------------+---------+--------+
> |0 |1575428400000|116285807|40084087|
> +------------+-------------+---------+--------+{code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)