[GitHub] [carbondata] shenjiayu17 opened a new pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

classic Classic list List threaded Threaded
127 messages Options
1234 ... 7
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 opened a new pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox

shenjiayu17 opened a new pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012


    ### Why is this PR needed?
    Spatial index feature optimization of CarbonData
   
    ### What changes were proposed in this PR?
   1. Update spatial index encoded algorithm, which can reduce the required properties of creating geo table
   2. Enhance geo query UDFs, support querying geo table with polygon list, polyline list, geoId range list. And add some geo transforming util UDFs.
   3. Load data (include LOAD and INSERT INTO) allows user to input spatial index, which column will still generated internally when user does not give.
   
       
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - Yes
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox

ajantha-bhat commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-731243925


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-731248638


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3077/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-731249184


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4833/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-731890305


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3097/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-731890504


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4850/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-731919095


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4851/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-731920282


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3098/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-732021180


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3101/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-732021580


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4854/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

marchpure commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-732209550


   @MarvinLitt  Please review


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r529216859



##########
File path: geo/src/main/java/org/apache/carbondata/geo/GeoConstants.java
##########
@@ -26,4 +26,31 @@ private GeoConstants() {
 
   // GeoHash type Spatial Index
   public static final String GEOHASH = "geohash";
+
+  // Regular expression to parse input polygons for IN_POLYGON_LIST
+  // public static final String POLYGON_REG_EXPRESSION = "POLYGON \\(\\(.*?\\)\\)";
+  public static final String POLYGON_REG_EXPRESSION = "(?<=POLYGON \\(\\()(.*?)(?=(\\)\\)))";
+
+  // Regular expression to parse input polylines for IN_POLYLINE_LIST
+  public static final String POLYLINE_REG_EXPRESSION = "LINESTRING \\(.*?\\)";
+
+  // Regular expression to parse input rangelists for IN_POLYGON_RANGE_LIST
+  public static final String RANGELIST_REG_EXPRESSION = "(?<=RANGELIST \\()(.*?)(?=\\))";
+
+  // delimiter of input points or ranges
+  public static final String DEFAULT_DELIMITER = ",";
+
+  // conversion factor of angle to radian
+  public static final double CONVERT_FACTOR = 180.0;
+  // Earth radius
+  public static final double EARTH_RADIUS = 6371004.0;

Review comment:
       Can remove EARTH_RADIUS const def from GeoHashIndex.java now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-733432993


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4892/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-733433233


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3138/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r530130471



##########
File path: geo/src/main/java/org/apache/carbondata/geo/GeoConstants.java
##########
@@ -26,4 +26,31 @@ private GeoConstants() {
 
   // GeoHash type Spatial Index
   public static final String GEOHASH = "geohash";
+
+  // Regular expression to parse input polygons for IN_POLYGON_LIST
+  // public static final String POLYGON_REG_EXPRESSION = "POLYGON \\(\\(.*?\\)\\)";
+  public static final String POLYGON_REG_EXPRESSION = "(?<=POLYGON \\(\\()(.*?)(?=(\\)\\)))";
+
+  // Regular expression to parse input polylines for IN_POLYLINE_LIST
+  public static final String POLYLINE_REG_EXPRESSION = "LINESTRING \\(.*?\\)";
+
+  // Regular expression to parse input rangelists for IN_POLYGON_RANGE_LIST
+  public static final String RANGELIST_REG_EXPRESSION = "(?<=RANGELIST \\()(.*?)(?=\\))";
+
+  // delimiter of input points or ranges
+  public static final String DEFAULT_DELIMITER = ",";
+
+  // conversion factor of angle to radian
+  public static final double CONVERT_FACTOR = 180.0;
+  // Earth radius
+  public static final double EARTH_RADIUS = 6371004.0;

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-733502985


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4896/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-733503678


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3142/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] MarvinLitt commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

MarvinLitt commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r529163393



##########
File path: geo/src/main/java/org/apache/carbondata/geo/GeoConstants.java
##########
@@ -26,4 +26,31 @@ private GeoConstants() {
 
   // GeoHash type Spatial Index
   public static final String GEOHASH = "geohash";
+
+  // Regular expression to parse input polygons for IN_POLYGON_LIST
+  // public static final String POLYGON_REG_EXPRESSION = "POLYGON \\(\\(.*?\\)\\)";
+  public static final String POLYGON_REG_EXPRESSION = "(?<=POLYGON \\(\\()(.*?)(?=(\\)\\)))";
+
+  // Regular expression to parse input polylines for IN_POLYLINE_LIST
+  public static final String POLYLINE_REG_EXPRESSION = "LINESTRING \\(.*?\\)";
+
+  // Regular expression to parse input rangelists for IN_POLYGON_RANGE_LIST
+  public static final String RANGELIST_REG_EXPRESSION = "(?<=RANGELIST \\()(.*?)(?=\\))";
+
+  // delimiter of input points or ranges
+  public static final String DEFAULT_DELIMITER = ",";
+
+  // conversion factor of angle to radian
+  public static final double CONVERT_FACTOR = 180.0;
+  // Earth radius
+  public static final double EARTH_RADIUS = 6371004.0;
+
+  // used in Geo Hash calculation formula for improving calculation accuracy
+  public static final int CONVERSION_RATIO = 100000000;
+
+  // used for multiplying input longitude and latitude which are processed by * 10E6
+  public static final int CONVERSION_FACTOR_FOR_ACCURACY = 100;
+
+  // Length in meters of 1 degree of latitude
+  public static final double CONVERSION_FACTOR_OF_METER_TO_DEGREE = 111320;

Review comment:
       delete comment code

##########
File path: geo/src/main/java/org/apache/carbondata/geo/GeoHashIndex.java
##########
@@ -245,48 +195,24 @@ public String generate(List<?> sources) throws Exception {
    */
   @Override
   public List<Long[]> query(String polygon) throws Exception {
-    if (!validate(polygon)) {
-      return null;
-    } else {
-      String[] pointList = polygon.trim().split(",");
-      List<double[]> queryList = new ArrayList<>();
-      for (String str: pointList) {
-        String[] points = splitString(str);
-        if (2 != points.length) {
-          throw new RuntimeException("longitude and latitude is a pair need 2 data");
-        } else {
-          try {
-            queryList.add(new double[] {Double.valueOf(points[0]), Double.valueOf(points[1])});
-          } catch (NumberFormatException e) {
-            throw new RuntimeException("can not covert the string data to double", e);
-          }
-        }
-      }
-      if (!checkPointsSame(pointList[0], pointList[pointList.length - 1])) {
-        throw new RuntimeException("the first point and last point in polygon should be same");
-      } else {
-        List<Long[]> rangeList = getPolygonRangeList(queryList);
-        return rangeList;
-      }
+    try {
+      List<double[]> queryList = GeoHashUtils.getPointListFromPolygon(polygon);
+      return getPolygonRangeList(queryList);
+    } catch (Exception e) {

Review comment:
       specific exception type should be specified

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala
##########
@@ -95,6 +95,12 @@ class CarbonEnv {
     sparkSession.udf.register("text_match", new TextMatchUDF)
     sparkSession.udf.register("text_match_with_limit", new TextMatchMaxDocUDF)
     sparkSession.udf.register("in_polygon", new InPolygonUDF)
+    sparkSession.udf.register("in_polygon_list", new InPolygonListUDF)
+    sparkSession.udf.register("in_polyline_list", new InPolylineListUDF)
+    sparkSession.udf.register("in_polygon_range_list", new InPolygonRangeListUDF)
+

Review comment:
       register code should not at here, here is the overall process should not contain details func

##########
File path: core/src/main/java/org/apache/carbondata/core/util/CustomIndex.java
##########
@@ -59,6 +59,14 @@
    */
   public abstract ReturnType query(String query) throws Exception;
 
+  /**
+   * Query processor for custom index.
+   * @param queryPointList query point list for GeoHashIndex
+   * @return Returns list of ranges to be fetched
+   * @throws Exception
+   */
+  public abstract ReturnType query(List<double[]> queryPointList) throws Exception;
+

Review comment:
       please add jira url to show the theory

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala
##########
@@ -95,6 +95,12 @@ class CarbonEnv {
     sparkSession.udf.register("text_match", new TextMatchUDF)
     sparkSession.udf.register("text_match_with_limit", new TextMatchMaxDocUDF)
     sparkSession.udf.register("in_polygon", new InPolygonUDF)
+    sparkSession.udf.register("in_polygon_list", new InPolygonListUDF)
+    sparkSession.udf.register("in_polyline_list", new InPolylineListUDF)
+    sparkSession.udf.register("in_polygon_range_list", new InPolygonRangeListUDF)
+

Review comment:
       More and more, consider to use a function processing separately




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] MarvinLitt commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

MarvinLitt commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r530356529



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/geo/GeoTest.scala
##########
@@ -17,6 +17,8 @@
 
 package org.apache.carbondata.geo
 
+import scala.collection.mutable
+

Review comment:
       please add some more udf test for intersection status.
   like full inclusion, semi inclusion




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-733735461






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


1234 ... 7