[GitHub] [carbondata] MarvinLitt opened a new pull request #3520: [WIP]add spatial-index user guid to doc

classic Classic list List threaded Threaded
67 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc

GitBox
VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r399399530
 
 

 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,109 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+# What is spatial index
+
+[A spatial index](https://gistbok.ucgis.org/topic-keywords/indexing) is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.
+
+
+
+# How does carbondata implement spatial index
+
+There are many opensource implementations for spatial indexing and to process spatial queries. CarbonData implements a different way of spatial index. Its core idea is to use the raster data. Raster is made up of matrix of cells organized into rows and columns(called a grid). Each cell represents a coordinate. And the index for that coodrinate is generated using longitude and latitude, like the [Z order curve](https://en.wikipedia.org/wiki/Z-order_curve).
+
+CarbonData rasterize the user data during data load into segments. A set of latitude and longitude represents a grid range. The size of the grid can be configured. Hence, the coordinates loaded are often discrete and not continuous.
+
+Below figure shows the relationship between the grid and the points residing in it. Black point represents the center point of the grid, and the red points are the coordinates at the arbitrary positions inside the grid. The red points can be replaced by the center point of the grid to indicate that the points lies within the grid. During data load, CarbonData generates an Index for coordinate according to row and column of the grid(in the raster) where that coordinate lies. These Indexes are the same as Z order. For the detailed conversion algorithm, please refer to the design documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata use the polygon and spatial region information passed in when creating a table build a quad tree.  The nodes in the quad tree are composed of hash ids generated by the row and column information projected in the polygon area. When the query polygon area is not disjoint from the grid center point, the grid is considered selected.  In the following figure, user select a quadrilateral polygon,  The grid with the center point in the region will generate a quadtree. A list of line with continuous properties will be generated in the query process, like [97->97  99->99  102->102  104->111  120->120  122->123  151->151  157->158  159->159  192->208  210->210  216->216  225->225  228->229], each part of the list represents a continuous grid area. Carbondata use that line list to prune and filtered. About the detail can be search under https://issues.apache.org/jira/browse/CARBONDATA-3548
+
+![File Directory Structure](../docs/images/spatial-index-2.png?raw=true)
+
+
+
+# Installation and Deployment
+
+Build source with modules geo open, can open "pom.xml" and check whether the mode has been open.
+
+![File Directory Structure](../docs/images/spatial-index-3.png?raw=true)
+
+Then you can get the "carbondata-geo-2.0.0-SNAPSHOT.jar" keep this jar and 'jst-core.jar' to your carbonlib path.
+
+## Basic Command
+
+### Create Table
+
+spatial index need to appoint the source column and other regional information. carbon will create a Invisible hash id column.
+
+example
+
+```sql
+create table source_index(id BIGINT, latitude long, longitude long) stored by 'carbondata' TBLPROPERTIES (
+'INDEX_HANDLER'='mygeohash',  
+'INDEX_HANDLER.mygeohash.type'='geohash',  
+'INDEX_HANDLER.mygeohash.sourcecolumns'='longitude, latitude',  
+'INDEX_HANDLER.mygeohash.originLatitude'='19.832277',  
+'INDEX_HANDLER.mygeohash.gridSize'='50',  
+'INDEX_HANDLER.mygeohash.minLongitude'='1.811865',  
+'INDEX_HANDLER.mygeohash.maxLongitude'='2.782233',  
+'INDEX_HANDLER.mygeohash.minLatitude'='19.832277',  
+'INDEX_HANDLER.mygeohash.maxLatitude'='20.225281',  
+'INDEX_HANDLER.mygeohash.conversionRatio'='1000000');
+```
+
+| Name                                    | Value                 | Describe                                                     |
 
 Review comment:
   Modified

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r399400512
 
 

 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,109 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+# What is spatial index
+
+[A spatial index](https://gistbok.ucgis.org/topic-keywords/indexing) is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.
+
+
+
+# How does carbondata implement spatial index
+
+There are many opensource implementations for spatial indexing and to process spatial queries. CarbonData implements a different way of spatial index. Its core idea is to use the raster data. Raster is made up of matrix of cells organized into rows and columns(called a grid). Each cell represents a coordinate. And the index for that coodrinate is generated using longitude and latitude, like the [Z order curve](https://en.wikipedia.org/wiki/Z-order_curve).
+
+CarbonData rasterize the user data during data load into segments. A set of latitude and longitude represents a grid range. The size of the grid can be configured. Hence, the coordinates loaded are often discrete and not continuous.
+
+Below figure shows the relationship between the grid and the points residing in it. Black point represents the center point of the grid, and the red points are the coordinates at the arbitrary positions inside the grid. The red points can be replaced by the center point of the grid to indicate that the points lies within the grid. During data load, CarbonData generates an Index for coordinate according to row and column of the grid(in the raster) where that coordinate lies. These Indexes are the same as Z order. For the detailed conversion algorithm, please refer to the design documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata use the polygon and spatial region information passed in when creating a table build a quad tree.  The nodes in the quad tree are composed of hash ids generated by the row and column information projected in the polygon area. When the query polygon area is not disjoint from the grid center point, the grid is considered selected.  In the following figure, user select a quadrilateral polygon,  The grid with the center point in the region will generate a quadtree. A list of line with continuous properties will be generated in the query process, like [97->97  99->99  102->102  104->111  120->120  122->123  151->151  157->158  159->159  192->208  210->210  216->216  225->225  228->229], each part of the list represents a continuous grid area. Carbondata use that line list to prune and filtered. About the detail can be search under https://issues.apache.org/jira/browse/CARBONDATA-3548
+
+![File Directory Structure](../docs/images/spatial-index-2.png?raw=true)
+
+
+
+# Installation and Deployment
+
+Build source with modules geo open, can open "pom.xml" and check whether the mode has been open.
+
+![File Directory Structure](../docs/images/spatial-index-3.png?raw=true)
+
+Then you can get the "carbondata-geo-2.0.0-SNAPSHOT.jar" keep this jar and 'jst-core.jar' to your carbonlib path.
+
+## Basic Command
+
+### Create Table
+
+spatial index need to appoint the source column and other regional information. carbon will create a Invisible hash id column.
+
+example
+
+```sql
+create table source_index(id BIGINT, latitude long, longitude long) stored by 'carbondata' TBLPROPERTIES (
+'INDEX_HANDLER'='mygeohash',  
+'INDEX_HANDLER.mygeohash.type'='geohash',  
+'INDEX_HANDLER.mygeohash.sourcecolumns'='longitude, latitude',  
+'INDEX_HANDLER.mygeohash.originLatitude'='19.832277',  
+'INDEX_HANDLER.mygeohash.gridSize'='50',  
+'INDEX_HANDLER.mygeohash.minLongitude'='1.811865',  
+'INDEX_HANDLER.mygeohash.maxLongitude'='2.782233',  
+'INDEX_HANDLER.mygeohash.minLatitude'='19.832277',  
+'INDEX_HANDLER.mygeohash.maxLatitude'='20.225281',  
+'INDEX_HANDLER.mygeohash.conversionRatio'='1000000');
 
 Review comment:
   It is already  described in the below table properties description. I think, it is not required here again here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r399400743
 
 

 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,109 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+# What is spatial index
+
+[A spatial index](https://gistbok.ucgis.org/topic-keywords/indexing) is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.
+
+
+
+# How does carbondata implement spatial index
+
+There are many opensource implementations for spatial indexing and to process spatial queries. CarbonData implements a different way of spatial index. Its core idea is to use the raster data. Raster is made up of matrix of cells organized into rows and columns(called a grid). Each cell represents a coordinate. And the index for that coodrinate is generated using longitude and latitude, like the [Z order curve](https://en.wikipedia.org/wiki/Z-order_curve).
+
+CarbonData rasterize the user data during data load into segments. A set of latitude and longitude represents a grid range. The size of the grid can be configured. Hence, the coordinates loaded are often discrete and not continuous.
+
+Below figure shows the relationship between the grid and the points residing in it. Black point represents the center point of the grid, and the red points are the coordinates at the arbitrary positions inside the grid. The red points can be replaced by the center point of the grid to indicate that the points lies within the grid. During data load, CarbonData generates an Index for coordinate according to row and column of the grid(in the raster) where that coordinate lies. These Indexes are the same as Z order. For the detailed conversion algorithm, please refer to the design documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata use the polygon and spatial region information passed in when creating a table build a quad tree.  The nodes in the quad tree are composed of hash ids generated by the row and column information projected in the polygon area. When the query polygon area is not disjoint from the grid center point, the grid is considered selected.  In the following figure, user select a quadrilateral polygon,  The grid with the center point in the region will generate a quadtree. A list of line with continuous properties will be generated in the query process, like [97->97  99->99  102->102  104->111  120->120  122->123  151->151  157->158  159->159  192->208  210->210  216->216  225->225  228->229], each part of the list represents a continuous grid area. Carbondata use that line list to prune and filtered. About the detail can be search under https://issues.apache.org/jira/browse/CARBONDATA-3548
 
 Review comment:
   Added

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3520: [CARBONDATA-3548]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3520: [CARBONDATA-3548]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#issuecomment-605214533
 
 
   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/874/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3520: [CARBONDATA-3548]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3520: [CARBONDATA-3548]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#issuecomment-605217739
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2582/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on issue #3520: [CARBONDATA-3548]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
ajantha-bhat commented on issue #3520: [CARBONDATA-3548]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#issuecomment-605788071
 
 
   LGTM

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
asfgit closed pull request #3520: [CARBONDATA-3548]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
1234