Login  Register

[GitHub] [carbondata] MarvinLitt commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

Posted by GitBox on Nov 25, 2020; 12:53pm
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/GitHub-carbondata-shenjiayu17-opened-a-new-pull-request-4012-CARBONDATA-4051-Geo-spatial-index-algort-tp103286p103580.html


MarvinLitt commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r529163393



##########
File path: geo/src/main/java/org/apache/carbondata/geo/GeoConstants.java
##########
@@ -26,4 +26,31 @@ private GeoConstants() {
 
   // GeoHash type Spatial Index
   public static final String GEOHASH = "geohash";
+
+  // Regular expression to parse input polygons for IN_POLYGON_LIST
+  // public static final String POLYGON_REG_EXPRESSION = "POLYGON \\(\\(.*?\\)\\)";
+  public static final String POLYGON_REG_EXPRESSION = "(?<=POLYGON \\(\\()(.*?)(?=(\\)\\)))";
+
+  // Regular expression to parse input polylines for IN_POLYLINE_LIST
+  public static final String POLYLINE_REG_EXPRESSION = "LINESTRING \\(.*?\\)";
+
+  // Regular expression to parse input rangelists for IN_POLYGON_RANGE_LIST
+  public static final String RANGELIST_REG_EXPRESSION = "(?<=RANGELIST \\()(.*?)(?=\\))";
+
+  // delimiter of input points or ranges
+  public static final String DEFAULT_DELIMITER = ",";
+
+  // conversion factor of angle to radian
+  public static final double CONVERT_FACTOR = 180.0;
+  // Earth radius
+  public static final double EARTH_RADIUS = 6371004.0;
+
+  // used in Geo Hash calculation formula for improving calculation accuracy
+  public static final int CONVERSION_RATIO = 100000000;
+
+  // used for multiplying input longitude and latitude which are processed by * 10E6
+  public static final int CONVERSION_FACTOR_FOR_ACCURACY = 100;
+
+  // Length in meters of 1 degree of latitude
+  public static final double CONVERSION_FACTOR_OF_METER_TO_DEGREE = 111320;

Review comment:
       delete comment code

##########
File path: geo/src/main/java/org/apache/carbondata/geo/GeoHashIndex.java
##########
@@ -245,48 +195,24 @@ public String generate(List<?> sources) throws Exception {
    */
   @Override
   public List<Long[]> query(String polygon) throws Exception {
-    if (!validate(polygon)) {
-      return null;
-    } else {
-      String[] pointList = polygon.trim().split(",");
-      List<double[]> queryList = new ArrayList<>();
-      for (String str: pointList) {
-        String[] points = splitString(str);
-        if (2 != points.length) {
-          throw new RuntimeException("longitude and latitude is a pair need 2 data");
-        } else {
-          try {
-            queryList.add(new double[] {Double.valueOf(points[0]), Double.valueOf(points[1])});
-          } catch (NumberFormatException e) {
-            throw new RuntimeException("can not covert the string data to double", e);
-          }
-        }
-      }
-      if (!checkPointsSame(pointList[0], pointList[pointList.length - 1])) {
-        throw new RuntimeException("the first point and last point in polygon should be same");
-      } else {
-        List<Long[]> rangeList = getPolygonRangeList(queryList);
-        return rangeList;
-      }
+    try {
+      List<double[]> queryList = GeoHashUtils.getPointListFromPolygon(polygon);
+      return getPolygonRangeList(queryList);
+    } catch (Exception e) {

Review comment:
       specific exception type should be specified

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala
##########
@@ -95,6 +95,12 @@ class CarbonEnv {
     sparkSession.udf.register("text_match", new TextMatchUDF)
     sparkSession.udf.register("text_match_with_limit", new TextMatchMaxDocUDF)
     sparkSession.udf.register("in_polygon", new InPolygonUDF)
+    sparkSession.udf.register("in_polygon_list", new InPolygonListUDF)
+    sparkSession.udf.register("in_polyline_list", new InPolylineListUDF)
+    sparkSession.udf.register("in_polygon_range_list", new InPolygonRangeListUDF)
+

Review comment:
       register code should not at here, here is the overall process should not contain details func

##########
File path: core/src/main/java/org/apache/carbondata/core/util/CustomIndex.java
##########
@@ -59,6 +59,14 @@
    */
   public abstract ReturnType query(String query) throws Exception;
 
+  /**
+   * Query processor for custom index.
+   * @param queryPointList query point list for GeoHashIndex
+   * @return Returns list of ranges to be fetched
+   * @throws Exception
+   */
+  public abstract ReturnType query(List<double[]> queryPointList) throws Exception;
+

Review comment:
       please add jira url to show the theory

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala
##########
@@ -95,6 +95,12 @@ class CarbonEnv {
     sparkSession.udf.register("text_match", new TextMatchUDF)
     sparkSession.udf.register("text_match_with_limit", new TextMatchMaxDocUDF)
     sparkSession.udf.register("in_polygon", new InPolygonUDF)
+    sparkSession.udf.register("in_polygon_list", new InPolygonListUDF)
+    sparkSession.udf.register("in_polyline_list", new InPolylineListUDF)
+    sparkSession.udf.register("in_polygon_range_list", new InPolygonRangeListUDF)
+

Review comment:
       More and more, consider to use a function processing separately




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]