[GitHub] [carbondata] MarvinLitt opened a new pull request #3520: [WIP]add spatial-index user guid to doc

classic Classic list List threaded Threaded
67 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] MarvinLitt opened a new pull request #3520: [WIP]add spatial-index user guid to doc

GitBox
MarvinLitt opened a new pull request #3520: [WIP]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520
 
 
   Be sure to do all of the following checklist to help us incorporate
   your contribution quickly and easily:
   
    - [ ] Any interfaces changed?
   
    - [ ] Any backward compatibility impacted?
   
    - [ ] Document update required?
   
    - [ ] Testing done
           Please provide details on
           - Whether new unit test cases have been added or why no new tests are required?
           - How it is tested? Please attach test report.
           - Is it a performance related change? Please attach the performance test report.
           - Any additional information to help reviewers in testing this change.
         
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] SachinR12 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

GitBox
SachinR12 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364136985
 
 

 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,94 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that use GeoMesa format for spatial query. now carbondata implements  a different way of spatial index, more like an UDF.  Its core is to use grid coordinates to generate coordinate based hash ID, like Z order, it's also regionally continuous.
+
+CarbonData implements a grid spatial index. It requires that the data has been gridded when it is load into segments. A set of latitude and longitude represents a grid range, the size of the grid can be specified artificially. So the coordinates of the loaded points are often discrete and not continuous.
+
+The grid and point relationship is like that black point is the middle of a grid, the red dot is just inside the grid. The red point is inside the grid, it can be replaced by the center point of the grid, indicating that the point is within the grid. Therefore, the coordinates of points in a grid are replaced by black points in the middle. This is the characteristic of data load.  At the same time of data load, carbondata will generate hash ID according to the coordinates of rows and columns of the grid. These hash IDs are the same as Z order when querying. Detailed conversion algorithm can refer to the design documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata use the polygon and spatial region information passed in when creating a table build a quad tree. The nodes in the quad tree are composed of hash ids generated by the row and column information projected in the polygon area and group photo in map area. When the query polygon area is not disjon from the grid center point, the grid is considered selected.  In the following figure, user select a quadrilateral polygon,  The grid with the center point in the region will generate a quadtree. A list of line with continuous properties will be generated in the query process, like [97->97  99->99  102->102  104->111  120->120  122->123  151->151  157->158  159->159  192->208  210->210  216->216  225->225  228->229], each part of the list represents a continuous grid area. Carbondata use that line list to prune and filtered. About the detail can be search under https://issues.apache.org/jira/browse/CARBONDATA-3548
+
+![File Directory Structure](../docs/images/spatial-index-2.png?raw=true)
+
+
+
+## Installation and Deployment
+
+Build source with modules geo open, can open "pom.xml" and check whether the mode has been open.
+
+![File Directory Structure](../docs/images/spatial-index-3.png?raw=true)
+
+Then you can get the "carbondata-geo-2.0.0-SNAPSHOT.jar" keep this jar and 'jst-core.jar' to your carbonlib path.
+
+## Basic Command
+
+### Create Table
+
+spatial index need to appoint the source column and other regional information. carbon will create a Invisible hash id column.
+
+example
+
+```
+create table source_index(id BIGINT, latitude long, longitude long) stored by 'carbondata' TBLPROPERTIES (
+'INDEX_HANDLER'='mygeohash',
+'INDEX_HANDLER.mygeohash.type'='geohash',  
+'INDEX_HANDLER.mygeohash.sourcecolumns'='longitude, latitude',  
+'INDEX_HANDLER.mygeohash.originLatitude'='19.832277',  
+'INDEX_HANDLER.mygeohash.gridSize'='50',  
+'INDEX_HANDLER.mygeohash.minLongitude'='1.811865',  
+'INDEX_HANDLER.mygeohash.maxLongitude'='2.782233',  
+'INDEX_HANDLER.mygeohash.minLatitude'='19.832277',  
+'INDEX_HANDLER.mygeohash.maxLatitude'='20.225281',  
+'INDEX_HANDLER.mygeohash.conversionRatio'='1000000');
+```
+
+| **Property**  | **Description**                                              |
+| ------------- | :----------------------------------------------------------- |
+| INDEX_HANDLER | Custom index handler. This handler allows user to  create a new column from the set of schema columns. Newly created column name  is same as that of handler name. Type and sourcecolumns properties for the handler are mandatory  properties. At present, only supported value for type property is 'geohash'. A simple  default implementation class can be available in carbon. User can hook their custom implementation class  for  geohash by extending the default implementation.     An  advanced sophisticated custom implementation of geohash can also be  made available in carbon. It can expect following additional table properties  for handler:  'INDEX_HANDLER.xxx.gridSize',  'INDEX_HANDLER.xxx.minLongitude',   'INDEX_HANDLER.xxx.maxLongitude',   'INDEX_HANDLER.xxx.minLatitude',   'INDEX_HANDLER.xxx.maxLatitude',     User can add their own table properties for  handler similar to above format and access them in their custom implementation  class. All those properties are optional at carbon. Can specify their implementation  class with 'INDEX_HANDLER.xxx.class' property  Default implementation class can support handler  column value generation taking source column values into account for each row  and support query filtering based on the source columns. Generated handler column  is invisible to user. Column is not allowed in any DDL commands and  properties except in SORT_COLUMNS table property. ' conversionRatio' allow user to translate Longitude and  Latitude to long. Example: when data loading the real Longitude=13.123456, Latitude=101.12356, configure INDEX_HANDLER.mygeohash.conversionRatio= '1000000', then user can change data to Longitude=13123456, Latitude=10112356. they are long type simpler and faster than floating-point number in calculation. |
+
+
+
+### Data Loading
+
+In this process, tables with spatial indexes configured will create invisible hash ids column data based on  'sourcecolumns' user defined and store it. Please ensure that the specified sourcecolumns column has data.
+
+### Data query
+
+CarbonData implements a UDF named 'IN_POLYGON'.
+
+example:
+
+```
+select * from source_index where IN_POLYGON('16.321011 4.123503,16.137676 5.947911,16.560993 5.935276,16.321011 4.123503'
+```
 
 Review comment:
   NIT: Add closing braces for IN_POLYGON

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] pawanmalwal commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
pawanmalwal commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364155418
 
 

 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,94 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that use GeoMesa format for spatial query. now carbondata implements  a different way of spatial index, more like an UDF.  Its core is to use grid coordinates to generate coordinate based hash ID, like Z order, it's also regionally continuous.
+
+CarbonData implements a grid spatial index. It requires that the data has been gridded when it is load into segments. A set of latitude and longitude represents a grid range, the size of the grid can be specified artificially. So the coordinates of the loaded points are often discrete and not continuous.
+
+The grid and point relationship is like that black point is the middle of a grid, the red dot is just inside the grid. The red point is inside the grid, it can be replaced by the center point of the grid, indicating that the point is within the grid. Therefore, the coordinates of points in a grid are replaced by black points in the middle. This is the characteristic of data load.  At the same time of data load, carbondata will generate hash ID according to the coordinates of rows and columns of the grid. These hash IDs are the same as Z order when querying. Detailed conversion algorithm can refer to the design documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata use the polygon and spatial region information passed in when creating a table build a quad tree. The nodes in the quad tree are composed of hash ids generated by the row and column information projected in the polygon area and group photo in map area. When the query polygon area is not disjon from the grid center point, the grid is considered selected.  In the following figure, user select a quadrilateral polygon,  The grid with the center point in the region will generate a quadtree. A list of line with continuous properties will be generated in the query process, like [97->97  99->99  102->102  104->111  120->120  122->123  151->151  157->158  159->159  192->208  210->210  216->216  225->225  228->229], each part of the list represents a continuous grid area. Carbondata use that line list to prune and filtered. About the detail can be search under https://issues.apache.org/jira/browse/CARBONDATA-3548
+
+![File Directory Structure](../docs/images/spatial-index-2.png?raw=true)
+
+
+
+## Installation and Deployment
+
+Build source with modules geo open, can open "pom.xml" and check whether the mode has been open.
+
+![File Directory Structure](../docs/images/spatial-index-3.png?raw=true)
+
+Then you can get the "carbondata-geo-2.0.0-SNAPSHOT.jar" keep this jar and 'jst-core.jar' to your carbonlib path.
+
+## Basic Command
+
+### Create Table
+
+spatial index need to appoint the source column and other regional information. carbon will create a Invisible hash id column.
+
+example
+
+```
+create table source_index(id BIGINT, latitude long, longitude long) stored by 'carbondata' TBLPROPERTIES (
+'INDEX_HANDLER'='mygeohash',
+'INDEX_HANDLER.mygeohash.type'='geohash',  
+'INDEX_HANDLER.mygeohash.sourcecolumns'='longitude, latitude',  
+'INDEX_HANDLER.mygeohash.originLatitude'='19.832277',  
+'INDEX_HANDLER.mygeohash.gridSize'='50',  
+'INDEX_HANDLER.mygeohash.minLongitude'='1.811865',  
+'INDEX_HANDLER.mygeohash.maxLongitude'='2.782233',  
+'INDEX_HANDLER.mygeohash.minLatitude'='19.832277',  
+'INDEX_HANDLER.mygeohash.maxLatitude'='20.225281',  
+'INDEX_HANDLER.mygeohash.conversionRatio'='1000000');
+```
 
 Review comment:
   Please add one liner description for all the INDEX_HANDLER properties for better understanding.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] pawanmalwal commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
pawanmalwal commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364160576
 
 

 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,94 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that use GeoMesa format for spatial query. now carbondata implements  a different way of spatial index, more like an UDF.  Its core is to use grid coordinates to generate coordinate based hash ID, like Z order, it's also regionally continuous.
+
+CarbonData implements a grid spatial index. It requires that the data has been gridded when it is load into segments. A set of latitude and longitude represents a grid range, the size of the grid can be specified artificially. So the coordinates of the loaded points are often discrete and not continuous.
+
+The grid and point relationship is like that black point is the middle of a grid, the red dot is just inside the grid. The red point is inside the grid, it can be replaced by the center point of the grid, indicating that the point is within the grid. Therefore, the coordinates of points in a grid are replaced by black points in the middle. This is the characteristic of data load.  At the same time of data load, carbondata will generate hash ID according to the coordinates of rows and columns of the grid. These hash IDs are the same as Z order when querying. Detailed conversion algorithm can refer to the design documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata use the polygon and spatial region information passed in when creating a table build a quad tree. The nodes in the quad tree are composed of hash ids generated by the row and column information projected in the polygon area and group photo in map area. When the query polygon area is not disjon from the grid center point, the grid is considered selected.  In the following figure, user select a quadrilateral polygon,  The grid with the center point in the region will generate a quadtree. A list of line with continuous properties will be generated in the query process, like [97->97  99->99  102->102  104->111  120->120  122->123  151->151  157->158  159->159  192->208  210->210  216->216  225->225  228->229], each part of the list represents a continuous grid area. Carbondata use that line list to prune and filtered. About the detail can be search under https://issues.apache.org/jira/browse/CARBONDATA-3548
 
 Review comment:
    The nodes in the quad tree are composed of hash ids generated by the row and column information projected in the polygon area and **group photo** in map area.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] pawanmalwal commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
pawanmalwal commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364161231
 
 

 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,94 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that use GeoMesa format for spatial query. now carbondata implements  a different way of spatial index, more like an UDF.  Its core is to use grid coordinates to generate coordinate based hash ID, like Z order, it's also regionally continuous.
+
+CarbonData implements a grid spatial index. It requires that the data has been gridded when it is load into segments. A set of latitude and longitude represents a grid range, the size of the grid can be specified artificially. So the coordinates of the loaded points are often discrete and not continuous.
+
+The grid and point relationship is like that black point is the middle of a grid, the red dot is just inside the grid. The red point is inside the grid, it can be replaced by the center point of the grid, indicating that the point is within the grid. Therefore, the coordinates of points in a grid are replaced by black points in the middle. This is the characteristic of data load.  At the same time of data load, carbondata will generate hash ID according to the coordinates of rows and columns of the grid. These hash IDs are the same as Z order when querying. Detailed conversion algorithm can refer to the design documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata use the polygon and spatial region information passed in when creating a table build a quad tree. The nodes in the quad tree are composed of hash ids generated by the row and column information projected in the polygon area and group photo in map area. When the query polygon area is not disjon from the grid center point, the grid is considered selected.  In the following figure, user select a quadrilateral polygon,  The grid with the center point in the region will generate a quadtree. A list of line with continuous properties will be generated in the query process, like [97->97  99->99  102->102  104->111  120->120  122->123  151->151  157->158  159->159  192->208  210->210  216->216  225->225  228->229], each part of the list represents a continuous grid area. Carbondata use that line list to prune and filtered. About the detail can be search under https://issues.apache.org/jira/browse/CARBONDATA-3548
 
 Review comment:
   This statement is not clear. "group photo" looks like a typo

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] pawanmalwal commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
pawanmalwal commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364162923
 
 

 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,94 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that use GeoMesa format for spatial query. now carbondata implements  a different way of spatial index, more like an UDF.  Its core is to use grid coordinates to generate coordinate based hash ID, like Z order, it's also regionally continuous.
+
+CarbonData implements a grid spatial index. It requires that the data has been gridded when it is load into segments. A set of latitude and longitude represents a grid range, the size of the grid can be specified artificially. So the coordinates of the loaded points are often discrete and not continuous.
+
+The grid and point relationship is like that black point is the middle of a grid, the red dot is just inside the grid. The red point is inside the grid, it can be replaced by the center point of the grid, indicating that the point is within the grid. Therefore, the coordinates of points in a grid are replaced by black points in the middle. This is the characteristic of data load.  At the same time of data load, carbondata will generate hash ID according to the coordinates of rows and columns of the grid. These hash IDs are the same as Z order when querying. Detailed conversion algorithm can refer to the design documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata use the polygon and spatial region information passed in when creating a table build a quad tree. The nodes in the quad tree are composed of hash ids generated by the row and column information projected in the polygon area and group photo in map area. When the query polygon area is not disjon from the grid center point, the grid is considered selected.  In the following figure, user select a quadrilateral polygon,  The grid with the center point in the region will generate a quadtree. A list of line with continuous properties will be generated in the query process, like [97->97  99->99  102->102  104->111  120->120  122->123  151->151  157->158  159->159  192->208  210->210  216->216  225->225  228->229], each part of the list represents a continuous grid area. Carbondata use that line list to prune and filtered. About the detail can be search under https://issues.apache.org/jira/browse/CARBONDATA-3548
+
+![File Directory Structure](../docs/images/spatial-index-2.png?raw=true)
+
+
+
+## Installation and Deployment
+
+Build source with modules geo open, can open "pom.xml" and check whether the mode has been open.
+
+![File Directory Structure](../docs/images/spatial-index-3.png?raw=true)
+
+Then you can get the "carbondata-geo-2.0.0-SNAPSHOT.jar" keep this jar and 'jst-core.jar' to your carbonlib path.
+
+## Basic Command
+
+### Create Table
+
+spatial index need to appoint the source column and other regional information. carbon will create a Invisible hash id column.
+
+example
+
+```
+create table source_index(id BIGINT, latitude long, longitude long) stored by 'carbondata' TBLPROPERTIES (
+'INDEX_HANDLER'='mygeohash',
+'INDEX_HANDLER.mygeohash.type'='geohash',  
+'INDEX_HANDLER.mygeohash.sourcecolumns'='longitude, latitude',  
+'INDEX_HANDLER.mygeohash.originLatitude'='19.832277',  
+'INDEX_HANDLER.mygeohash.gridSize'='50',  
+'INDEX_HANDLER.mygeohash.minLongitude'='1.811865',  
+'INDEX_HANDLER.mygeohash.maxLongitude'='2.782233',  
+'INDEX_HANDLER.mygeohash.minLatitude'='19.832277',  
+'INDEX_HANDLER.mygeohash.maxLatitude'='20.225281',  
+'INDEX_HANDLER.mygeohash.conversionRatio'='1000000');
+```
+
+| **Property**  | **Description**                                              |
+| ------------- | :----------------------------------------------------------- |
+| INDEX_HANDLER | Custom index handler. This handler allows user to  create a new column from the set of schema columns. Newly created column name  is same as that of handler name. Type and sourcecolumns properties for the handler are mandatory  properties. At present, only supported value for type property is 'geohash'. A simple  default implementation class can be available in carbon. User can hook their custom implementation class  for  geohash by extending the default implementation.     An  advanced sophisticated custom implementation of geohash can also be  made available in carbon. It can expect following additional table properties  for handler:  'INDEX_HANDLER.xxx.gridSize',  'INDEX_HANDLER.xxx.minLongitude',   'INDEX_HANDLER.xxx.maxLongitude',   'INDEX_HANDLER.xxx.minLatitude',   'INDEX_HANDLER.xxx.maxLatitude',     User can add their own table properties for  handler similar to above format and access them in their custom implementation  class. All those properties are optional at carbon. Can specify their implementation  class with 'INDEX_HANDLER.xxx.class' property  Default implementation class can support handler  column value generation taking source column values into account for each row  and support query filtering based on the source columns. Generated handler column  is invisible to user. Column is not allowed in any DDL commands and  properties except in SORT_COLUMNS table property. ' conversionRatio' allow user to translate Longitude and  Latitude to long. Example: when data loading the real Longitude=13.123456, Latitude=101.12356, configure INDEX_HANDLER.mygeohash.conversionRatio= '1000000', then user can change data to Longitude=13123456, Latitude=10112356. they are long type simpler and faster than floating-point number in calculation. |
+
+
+
+### Data Loading
+
+In this process, tables with spatial indexes configured will create invisible hash ids column data based on  'sourcecolumns' user defined and store it. Please ensure that the specified sourcecolumns column has data.
+
+### Data query
+
+CarbonData implements a UDF named 'IN_POLYGON'.
+
+example:
+
+```
+select * from source_index where IN_POLYGON('16.321011 4.123503,16.137676 5.947911,16.560993 5.935276,16.321011 4.123503'
 
 Review comment:
   Please mention the minimum number of co-ordinates required for IN_POLYGON UDF.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364620267
 
 

 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,94 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+## What is spatial index
 
 Review comment:
   This is the first header in doc. Should be single #

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364625182
 
 

 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,94 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.
 
 Review comment:
   This whole section is lifted from https://gistbok.ucgis.org/topic-keywords/indexing
   Better give courtesy/citation to that material link to aviod plagiarism.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364625466
 
 

 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,94 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
 
 Review comment:
   Should be single #. It is a first level header.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364629046
 
 

 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,94 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that use GeoMesa format for spatial query. now carbondata implements  a different way of spatial index, more like an UDF.  Its core is to use grid coordinates to generate coordinate based hash ID, like Z order, it's also regionally continuous.
 
 Review comment:
   1. N should be capital letter as it starts after full stop - "now carbondata"
   2. Remove double space at -  "implements  a"
   3. Remove double space after fullstop at text - "UDF.  Its"

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] chetandb commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
chetandb commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364681431
 
 

 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,94 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
 
 Review comment:
   " What does carbondata implement spatial index" should be changed to  "How does carbondata implement spatial index"

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] chetandb commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
chetandb commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364681826
 
 

 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,94 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that use GeoMesa format for spatial query. now carbondata implements  a different way of spatial index, more like an UDF.  Its core is to use grid coordinates to generate coordinate based hash ID, like Z order, it's also regionally continuous.
 
 Review comment:
   "it's also regionally continuous."  --> this is confusing and can be rephrased.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] chetandb commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
chetandb commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364682173
 
 

 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,94 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that use GeoMesa format for spatial query. now carbondata implements  a different way of spatial index, more like an UDF.  Its core is to use grid coordinates to generate coordinate based hash ID, like Z order, it's also regionally continuous.
+
+CarbonData implements a grid spatial index. It requires that the data has been gridded when it is load into segments. A set of latitude and longitude represents a grid range, the size of the grid can be specified artificially. So the coordinates of the loaded points are often discrete and not continuous.
 
 Review comment:
   "data has been gridded" - This can be changed to "data has been arranged as grid"

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] chetandb commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
chetandb commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364684380
 
 

 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,94 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that use GeoMesa format for spatial query. now carbondata implements  a different way of spatial index, more like an UDF.  Its core is to use grid coordinates to generate coordinate based hash ID, like Z order, it's also regionally continuous.
+
+CarbonData implements a grid spatial index. It requires that the data has been gridded when it is load into segments. A set of latitude and longitude represents a grid range, the size of the grid can be specified artificially. So the coordinates of the loaded points are often discrete and not continuous.
+
+The grid and point relationship is like that black point is the middle of a grid, the red dot is just inside the grid. The red point is inside the grid, it can be replaced by the center point of the grid, indicating that the point is within the grid. Therefore, the coordinates of points in a grid are replaced by black points in the middle. This is the characteristic of data load.  At the same time of data load, carbondata will generate hash ID according to the coordinates of rows and columns of the grid. These hash IDs are the same as Z order when querying. Detailed conversion algorithm can refer to the design documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata use the polygon and spatial region information passed in when creating a table build a quad tree. The nodes in the quad tree are composed of hash ids generated by the row and column information projected in the polygon area and group photo in map area. When the query polygon area is not disjon from the grid center point, the grid is considered selected.  In the following figure, user select a quadrilateral polygon,  The grid with the center point in the region will generate a quadtree. A list of line with continuous properties will be generated in the query process, like [97->97  99->99  102->102  104->111  120->120  122->123  151->151  157->158  159->159  192->208  210->210  216->216  225->225  228->229], each part of the list represents a continuous grid area. Carbondata use that line list to prune and filtered. About the detail can be search under https://issues.apache.org/jira/browse/CARBONDATA-3548
 
 Review comment:
   "When the query polygon area is not disjon " - disjon can be changed to disjoint

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] chetandb commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
chetandb commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364685343
 
 

 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,94 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that use GeoMesa format for spatial query. now carbondata implements  a different way of spatial index, more like an UDF.  Its core is to use grid coordinates to generate coordinate based hash ID, like Z order, it's also regionally continuous.
+
+CarbonData implements a grid spatial index. It requires that the data has been gridded when it is load into segments. A set of latitude and longitude represents a grid range, the size of the grid can be specified artificially. So the coordinates of the loaded points are often discrete and not continuous.
+
+The grid and point relationship is like that black point is the middle of a grid, the red dot is just inside the grid. The red point is inside the grid, it can be replaced by the center point of the grid, indicating that the point is within the grid. Therefore, the coordinates of points in a grid are replaced by black points in the middle. This is the characteristic of data load.  At the same time of data load, carbondata will generate hash ID according to the coordinates of rows and columns of the grid. These hash IDs are the same as Z order when querying. Detailed conversion algorithm can refer to the design documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata use the polygon and spatial region information passed in when creating a table build a quad tree. The nodes in the quad tree are composed of hash ids generated by the row and column information projected in the polygon area and group photo in map area. When the query polygon area is not disjon from the grid center point, the grid is considered selected.  In the following figure, user select a quadrilateral polygon,  The grid with the center point in the region will generate a quadtree. A list of line with continuous properties will be generated in the query process, like [97->97  99->99  102->102  104->111  120->120  122->123  151->151  157->158  159->159  192->208  210->210  216->216  225->225  228->229], each part of the list represents a continuous grid area. Carbondata use that line list to prune and filtered. About the detail can be search under https://issues.apache.org/jira/browse/CARBONDATA-3548
+
+![File Directory Structure](../docs/images/spatial-index-2.png?raw=true)
+
+
+
+## Installation and Deployment
+
+Build source with modules geo open, can open "pom.xml" and check whether the mode has been open.
+
+![File Directory Structure](../docs/images/spatial-index-3.png?raw=true)
+
+Then you can get the "carbondata-geo-2.0.0-SNAPSHOT.jar" keep this jar and 'jst-core.jar' to your carbonlib path.
+
+## Basic Command
+
+### Create Table
+
+spatial index need to appoint the source column and other regional information. carbon will create a Invisible hash id column.
+
+example
+
+```
+create table source_index(id BIGINT, latitude long, longitude long) stored by 'carbondata' TBLPROPERTIES (
+'INDEX_HANDLER'='mygeohash',
+'INDEX_HANDLER.mygeohash.type'='geohash',  
+'INDEX_HANDLER.mygeohash.sourcecolumns'='longitude, latitude',  
+'INDEX_HANDLER.mygeohash.originLatitude'='19.832277',  
+'INDEX_HANDLER.mygeohash.gridSize'='50',  
+'INDEX_HANDLER.mygeohash.minLongitude'='1.811865',  
+'INDEX_HANDLER.mygeohash.maxLongitude'='2.782233',  
+'INDEX_HANDLER.mygeohash.minLatitude'='19.832277',  
+'INDEX_HANDLER.mygeohash.maxLatitude'='20.225281',  
+'INDEX_HANDLER.mygeohash.conversionRatio'='1000000');
+```
+
+| **Property**  | **Description**                                              |
+| ------------- | :----------------------------------------------------------- |
+| INDEX_HANDLER | Custom index handler. This handler allows user to  create a new column from the set of schema columns. Newly created column name  is same as that of handler name. Type and sourcecolumns properties for the handler are mandatory  properties. At present, only supported value for type property is 'geohash'. A simple  default implementation class can be available in carbon. User can hook their custom implementation class  for  geohash by extending the default implementation.     An  advanced sophisticated custom implementation of geohash can also be  made available in carbon. It can expect following additional table properties  for handler:  'INDEX_HANDLER.xxx.gridSize',  'INDEX_HANDLER.xxx.minLongitude',   'INDEX_HANDLER.xxx.maxLongitude',   'INDEX_HANDLER.xxx.minLatitude',   'INDEX_HANDLER.xxx.maxLatitude',     User can add their own table properties for  handler similar to above format and access them in their custom implementation  class. All those properties are optional at carbon. Can specify their implementation  class with 'INDEX_HANDLER.xxx.class' property  Default implementation class can support handler  column value generation taking source column values into account for each row  and support query filtering based on the source columns. Generated handler column  is invisible to user. Column is not allowed in any DDL commands and  properties except in SORT_COLUMNS table property. ' conversionRatio' allow user to translate Longitude and  Latitude to long. Example: when data loading the real Longitude=13.123456, Latitude=101.12356, configure INDEX_HANDLER.mygeohash.conversionRatio= '1000000', then user can change data to Longitude=13123456, Latitude=10112356. they are long type simpler and faster than floating-point number in calculation. |
 
 Review comment:
   "Can specify their implementation  class with 'INDEX_HANDLER.xxx.class' property " can be changed to "User Can specify their implementation  class with 'INDEX_HANDLER.xxx.class' property"

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] chetandb commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
chetandb commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364686383
 
 

 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,94 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that use GeoMesa format for spatial query. now carbondata implements  a different way of spatial index, more like an UDF.  Its core is to use grid coordinates to generate coordinate based hash ID, like Z order, it's also regionally continuous.
+
+CarbonData implements a grid spatial index. It requires that the data has been gridded when it is load into segments. A set of latitude and longitude represents a grid range, the size of the grid can be specified artificially. So the coordinates of the loaded points are often discrete and not continuous.
+
+The grid and point relationship is like that black point is the middle of a grid, the red dot is just inside the grid. The red point is inside the grid, it can be replaced by the center point of the grid, indicating that the point is within the grid. Therefore, the coordinates of points in a grid are replaced by black points in the middle. This is the characteristic of data load.  At the same time of data load, carbondata will generate hash ID according to the coordinates of rows and columns of the grid. These hash IDs are the same as Z order when querying. Detailed conversion algorithm can refer to the design documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata use the polygon and spatial region information passed in when creating a table build a quad tree. The nodes in the quad tree are composed of hash ids generated by the row and column information projected in the polygon area and group photo in map area. When the query polygon area is not disjon from the grid center point, the grid is considered selected.  In the following figure, user select a quadrilateral polygon,  The grid with the center point in the region will generate a quadtree. A list of line with continuous properties will be generated in the query process, like [97->97  99->99  102->102  104->111  120->120  122->123  151->151  157->158  159->159  192->208  210->210  216->216  225->225  228->229], each part of the list represents a continuous grid area. Carbondata use that line list to prune and filtered. About the detail can be search under https://issues.apache.org/jira/browse/CARBONDATA-3548
+
+![File Directory Structure](../docs/images/spatial-index-2.png?raw=true)
+
+
+
+## Installation and Deployment
+
+Build source with modules geo open, can open "pom.xml" and check whether the mode has been open.
+
+![File Directory Structure](../docs/images/spatial-index-3.png?raw=true)
+
+Then you can get the "carbondata-geo-2.0.0-SNAPSHOT.jar" keep this jar and 'jst-core.jar' to your carbonlib path.
+
+## Basic Command
+
+### Create Table
+
+spatial index need to appoint the source column and other regional information. carbon will create a Invisible hash id column.
+
+example
+
+```
+create table source_index(id BIGINT, latitude long, longitude long) stored by 'carbondata' TBLPROPERTIES (
+'INDEX_HANDLER'='mygeohash',
+'INDEX_HANDLER.mygeohash.type'='geohash',  
+'INDEX_HANDLER.mygeohash.sourcecolumns'='longitude, latitude',  
+'INDEX_HANDLER.mygeohash.originLatitude'='19.832277',  
+'INDEX_HANDLER.mygeohash.gridSize'='50',  
+'INDEX_HANDLER.mygeohash.minLongitude'='1.811865',  
+'INDEX_HANDLER.mygeohash.maxLongitude'='2.782233',  
+'INDEX_HANDLER.mygeohash.minLatitude'='19.832277',  
+'INDEX_HANDLER.mygeohash.maxLatitude'='20.225281',  
+'INDEX_HANDLER.mygeohash.conversionRatio'='1000000');
+```
+
+| **Property**  | **Description**                                              |
+| ------------- | :----------------------------------------------------------- |
+| INDEX_HANDLER | Custom index handler. This handler allows user to  create a new column from the set of schema columns. Newly created column name  is same as that of handler name. Type and sourcecolumns properties for the handler are mandatory  properties. At present, only supported value for type property is 'geohash'. A simple  default implementation class can be available in carbon. User can hook their custom implementation class  for  geohash by extending the default implementation.     An  advanced sophisticated custom implementation of geohash can also be  made available in carbon. It can expect following additional table properties  for handler:  'INDEX_HANDLER.xxx.gridSize',  'INDEX_HANDLER.xxx.minLongitude',   'INDEX_HANDLER.xxx.maxLongitude',   'INDEX_HANDLER.xxx.minLatitude',   'INDEX_HANDLER.xxx.maxLatitude',     User can add their own table properties for  handler similar to above format and access them in their custom implementation  class. All those properties are optional at carbon. Can specify their implementation  class with 'INDEX_HANDLER.xxx.class' property  Default implementation class can support handler  column value generation taking source column values into account for each row  and support query filtering based on the source columns. Generated handler column  is invisible to user. Column is not allowed in any DDL commands and  properties except in SORT_COLUMNS table property. ' conversionRatio' allow user to translate Longitude and  Latitude to long. Example: when data loading the real Longitude=13.123456, Latitude=101.12356, configure INDEX_HANDLER.mygeohash.conversionRatio= '1000000', then user can change data to Longitude=13123456, Latitude=10112356. they are long type simpler and faster than floating-point number in calculation. |
 
 Review comment:
   "they are long type simpler and faster than floating-point number in calculation." can be changed to "They are long type which is simpler and faster than floating-point number in calculation.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] chetandb commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
chetandb commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364687660
 
 

 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,94 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that use GeoMesa format for spatial query. now carbondata implements  a different way of spatial index, more like an UDF.  Its core is to use grid coordinates to generate coordinate based hash ID, like Z order, it's also regionally continuous.
+
+CarbonData implements a grid spatial index. It requires that the data has been gridded when it is load into segments. A set of latitude and longitude represents a grid range, the size of the grid can be specified artificially. So the coordinates of the loaded points are often discrete and not continuous.
+
+The grid and point relationship is like that black point is the middle of a grid, the red dot is just inside the grid. The red point is inside the grid, it can be replaced by the center point of the grid, indicating that the point is within the grid. Therefore, the coordinates of points in a grid are replaced by black points in the middle. This is the characteristic of data load.  At the same time of data load, carbondata will generate hash ID according to the coordinates of rows and columns of the grid. These hash IDs are the same as Z order when querying. Detailed conversion algorithm can refer to the design documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata use the polygon and spatial region information passed in when creating a table build a quad tree. The nodes in the quad tree are composed of hash ids generated by the row and column information projected in the polygon area and group photo in map area. When the query polygon area is not disjon from the grid center point, the grid is considered selected.  In the following figure, user select a quadrilateral polygon,  The grid with the center point in the region will generate a quadtree. A list of line with continuous properties will be generated in the query process, like [97->97  99->99  102->102  104->111  120->120  122->123  151->151  157->158  159->159  192->208  210->210  216->216  225->225  228->229], each part of the list represents a continuous grid area. Carbondata use that line list to prune and filtered. About the detail can be search under https://issues.apache.org/jira/browse/CARBONDATA-3548
+
+![File Directory Structure](../docs/images/spatial-index-2.png?raw=true)
+
+
+
+## Installation and Deployment
+
+Build source with modules geo open, can open "pom.xml" and check whether the mode has been open.
+
+![File Directory Structure](../docs/images/spatial-index-3.png?raw=true)
+
+Then you can get the "carbondata-geo-2.0.0-SNAPSHOT.jar" keep this jar and 'jst-core.jar' to your carbonlib path.
+
+## Basic Command
+
+### Create Table
+
+spatial index need to appoint the source column and other regional information. carbon will create a Invisible hash id column.
+
+example
+
+```
+create table source_index(id BIGINT, latitude long, longitude long) stored by 'carbondata' TBLPROPERTIES (
+'INDEX_HANDLER'='mygeohash',
+'INDEX_HANDLER.mygeohash.type'='geohash',  
+'INDEX_HANDLER.mygeohash.sourcecolumns'='longitude, latitude',  
+'INDEX_HANDLER.mygeohash.originLatitude'='19.832277',  
+'INDEX_HANDLER.mygeohash.gridSize'='50',  
+'INDEX_HANDLER.mygeohash.minLongitude'='1.811865',  
+'INDEX_HANDLER.mygeohash.maxLongitude'='2.782233',  
+'INDEX_HANDLER.mygeohash.minLatitude'='19.832277',  
+'INDEX_HANDLER.mygeohash.maxLatitude'='20.225281',  
+'INDEX_HANDLER.mygeohash.conversionRatio'='1000000');
+```
+
+| **Property**  | **Description**                                              |
+| ------------- | :----------------------------------------------------------- |
+| INDEX_HANDLER | Custom index handler. This handler allows user to  create a new column from the set of schema columns. Newly created column name  is same as that of handler name. Type and sourcecolumns properties for the handler are mandatory  properties. At present, only supported value for type property is 'geohash'. A simple  default implementation class can be available in carbon. User can hook their custom implementation class  for  geohash by extending the default implementation.     An  advanced sophisticated custom implementation of geohash can also be  made available in carbon. It can expect following additional table properties  for handler:  'INDEX_HANDLER.xxx.gridSize',  'INDEX_HANDLER.xxx.minLongitude',   'INDEX_HANDLER.xxx.maxLongitude',   'INDEX_HANDLER.xxx.minLatitude',   'INDEX_HANDLER.xxx.maxLatitude',     User can add their own table properties for  handler similar to above format and access them in their custom implementation  class. All those properties are optional at carbon. Can specify their implementation  class with 'INDEX_HANDLER.xxx.class' property  Default implementation class can support handler  column value generation taking source column values into account for each row  and support query filtering based on the source columns. Generated handler column  is invisible to user. Column is not allowed in any DDL commands and  properties except in SORT_COLUMNS table property. ' conversionRatio' allow user to translate Longitude and  Latitude to long. Example: when data loading the real Longitude=13.123456, Latitude=101.12356, configure INDEX_HANDLER.mygeohash.conversionRatio= '1000000', then user can change data to Longitude=13123456, Latitude=10112356. they are long type simpler and faster than floating-point number in calculation. |
+
+
+
+### Data Loading
+
+In this process, tables with spatial indexes configured will create invisible hash ids column data based on  'sourcecolumns' user defined and store it. Please ensure that the specified sourcecolumns column has data.
+
+### Data query
+
+CarbonData implements a UDF named 'IN_POLYGON'.
+
+example:
+
+```
+select * from source_index where IN_POLYGON('16.321011 4.123503,16.137676 5.947911,16.560993 5.935276,16.321011 4.123503'
 
 Review comment:
   Along with closing braces add semicolon also.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364701741
 
 

 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,94 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that use GeoMesa format for spatial query. now carbondata implements  a different way of spatial index, more like an UDF.  Its core is to use grid coordinates to generate coordinate based hash ID, like Z order, it's also regionally continuous.
 
 Review comment:
   Have read the text and have few suggestions on repharsing certain parts of it, To be clear, have attached the modified text below. Can we rephrase like this -->
   There are many opensource implementations for spatial indexing and to process spatial queries. CarbonData implements a different way of spatial index. Its core idea is to use the raster data. Raster is made up of matrix of cells organized into rows and columns(called a grid). Each cell represents a coordinate. And the index for the coodrinate is generated using longitude and latitude pair, like the Z order curve.
   
                     CarbonData implements a grid spatial index. It requires rasterization of data before loading into segments. A set of latitude and longitude represents a grid range. The size of the grid can be configured. So the coordinates of the loaded points are often discrete and not continuous.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364701741
 
 

 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,94 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that use GeoMesa format for spatial query. now carbondata implements  a different way of spatial index, more like an UDF.  Its core is to use grid coordinates to generate coordinate based hash ID, like Z order, it's also regionally continuous.
 
 Review comment:
   Have read the text and have few suggestions on repharsing certain parts of it, To be clear, have attached the modified text below. Can we rephrase like this -->
   There are many opensource implementations for spatial indexing and to process spatial queries. CarbonData implements a different way of spatial index. Its core idea is to use the raster data. Raster is made up of matrix of cells organized into rows and columns(called a grid). Each cell represents a coordinate. And the index for the coodrinate is generated using longitude and latitude pair, like the Z order curve.
                        CarbonData implements a grid spatial index. It requires rasterization of data before loading into segments. A set of latitude and longitude represents a grid range. The size of the grid can be configured. So the coordinates of the loaded points are often discrete and not continuous.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

GitBox
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364701741
 
 

 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,94 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that use GeoMesa format for spatial query. now carbondata implements  a different way of spatial index, more like an UDF.  Its core is to use grid coordinates to generate coordinate based hash ID, like Z order, it's also regionally continuous.
 
 Review comment:
   Have read the text and have few suggestions on repharsing certain parts of it, To be clear, have attached the modified text below. Can we rephrase like this -->
   `There are many opensource implementations for spatial indexing and to process spatial queries. CarbonData implements a different way of spatial index. Its core idea is to use the raster data. Raster is made up of matrix of cells organized into rows and columns(called a grid). Each cell represents a coordinate. And the index for the coodrinate is generated using longitude and latitude pair, like the Z order curve.`
                        `CarbonData implements a grid spatial index. It requires rasterization of data before loading into segments. A set of latitude and longitude represents a grid range. The size of the grid can be configured. So the coordinates of the loaded points are often discrete and not continuous.`
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
1234