[GitHub] [carbondata] shenjiayu17 opened a new pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

classic Classic list List threaded Threaded
127 messages Options
1234567
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533986229



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/geo/GeoTest.scala
##########
@@ -276,17 +274,14 @@ class GeoTest extends QueryTest with BeforeAndAfterAll with BeforeAndAfterEach {
          | 'SPATIAL_INDEX.spatial.sourcecolumns'='longitude, latitude',
          | 'SPATIAL_INDEX.spatial.originLatitude'='39.832277',
          | 'SPATIAL_INDEX.spatial.gridSize'='60',
-         | 'SPATIAL_INDEX.spatial.minLongitude'='115.811865',
-         | 'SPATIAL_INDEX.spatial.maxLongitude'='116.782233',
-         | 'SPATIAL_INDEX.spatial.minLatitude'='39.832277',
-         | 'SPATIAL_INDEX.spatial.maxLatitude'='40.225281',
          | 'SPATIAL_INDEX.spatial.conversionRatio'='1000000')
        """.stripMargin)
     loadData(sourceTable)
     createTable(targetTable)
+    // INSERT INTO will keep SPATIAL_INDEX column from sourceTable instead of generating internally
     sql(s"insert into  $targetTable select * from $sourceTable")
-    checkAnswer(sql(s"select *from $targetTable where mygeohash = '2196036'"),
-      Seq(Row(2196036, 1575428400000L, 116337069, 39951887)))
+    checkAnswer(sql(s"select *from $targetTable where mygeohash = '233137655761'"),

Review comment:
       Its requriment from Discovery. In the pr message `3. Load data (include LOAD and INSERT INTO) allows user to input spatial index, which column will still generated internally when user does not give.`    
   
   sourceTable contains mygeohash value and insert with this value, so copy to targetTable. If insert without this value, targetTable will general internally.
   Actually, If input mygeohash is wrong, this indeed seems to cause mygeohash is Inconsistent with `gridsize` and `originLatitude` of targetTable.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533987973



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/geo/GeoTest.scala
##########
@@ -354,16 +349,12 @@ class GeoTest extends QueryTest with BeforeAndAfterAll with BeforeAndAfterEach {
          | 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
          | 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
          | 'SPATIAL_INDEX.mygeohash.gridSize'='50',
-         | 'SPATIAL_INDEX.mygeohash.minLongitude'='115.811865',
-         | 'SPATIAL_INDEX.mygeohash.maxLongitude'='116.782233',
-         | 'SPATIAL_INDEX.mygeohash.minLatitude'='39.832277',
-         | 'SPATIAL_INDEX.mygeohash.maxLatitude'='40.225281',
          | 'SPATIAL_INDEX.mygeohash.conversionRatio'='1000000')
        """.stripMargin)
     sql(s"insert into $table1 select 0, 116337069, 39951887, 1575428400000")
     checkAnswer(
-      sql(s"select * from $table1 where mygeohash = '2196036'"),
-      Seq(Row(2196036, 116337069, 39951887, 1575428400000L)))
+      sql(s"select * from $table1 where mygeohash = '0'"),

Review comment:
       Its requriment from Discovery. written in the pr message.
   because user specifies the value of mygeohash, then we dont generate again.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533959413



##########
File path: geo/src/main/java/org/apache/carbondata/geo/GeoHashUtils.java
##########
@@ -0,0 +1,411 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo;
+
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Objects;
+
+public class GeoHashUtils {
+
+  /**
+   * Get the degree of each grid in the east-west direction.
+   *
+   * @param originLatitude the origin point latitude
+   * @param gridSize the grid size
+   * @return Delta X is the degree of each grid in the east-west direction
+   */
+  public static double getDeltaX(double originLatitude, int gridSize) {
+    double mCos = Math.cos(originLatitude * Math.PI / GeoConstants.CONVERT_FACTOR);
+    return (GeoConstants.CONVERT_FACTOR * gridSize) / (Math.PI * GeoConstants.EARTH_RADIUS * mCos);
+  }
+
+  /**
+   * Get the degree of each grid in the north-south direction.
+   *
+   * @param gridSize the grid size
+   * @return Delta Y is the degree of each grid in the north-south direction
+   */
+  public static double getDeltaY(int gridSize) {
+    return (GeoConstants.CONVERT_FACTOR * gridSize) / (Math.PI * GeoConstants.EARTH_RADIUS);
+  }
+
+  /**
+   * Calculate the number of knives cut
+   *
+   * @param gridSize the grid size
+   * @param originLatitude the origin point latitude
+   * @return The number of knives cut
+   */
+  public static int getCutCount(int gridSize, double originLatitude) {
+    double deltaX = getDeltaX(originLatitude, gridSize);
+    int countX = Double.valueOf(
+        Math.ceil(Math.log(2 * GeoConstants.CONVERT_FACTOR / deltaX) / Math.log(2))).intValue();
+    double deltaY = getDeltaY(gridSize);
+    int countY = Double.valueOf(
+        Math.ceil(Math.log(GeoConstants.CONVERT_FACTOR / deltaY) / Math.log(2))).intValue();
+    return Math.max(countX, countY);
+  }
+
+  /**
+   * Convert input longitude and latitude to GeoID
+   *
+   * @param longitude Longitude, the actual longitude and latitude are processed by * coefficient,
+   *                  and the floating-point calculation is converted to integer calculation
+   * @param latitude Latitude, the actual longitude and latitude are processed by * coefficient,
+   *                  and the floating-point calculation is converted to integer calculation.
+   * @param oriLatitude the origin point latitude
+   * @param gridSize the grid size
+   * @return GeoID
+   */
+  public static long lonLat2GeoID(long longitude, long latitude, double oriLatitude, int gridSize) {
+    long longtitudeByRatio = longitude * GeoConstants.CONVERSION_FACTOR_FOR_ACCURACY;
+    long latitudeByRatio = latitude * GeoConstants.CONVERSION_FACTOR_FOR_ACCURACY;
+    int[] ij = lonLat2ColRow(longtitudeByRatio, latitudeByRatio, oriLatitude, gridSize);
+    return colRow2GeoID(ij[0], ij[1]);
+  }
+
+  /**
+   * Calculate geo id through grid index coordinates, the row and column of grid coordinates
+   * can be transformed by latitude and longitude
+   *
+   * @param longitude Longitude, the actual longitude and latitude are processed by * coefficient,
+   * and the floating-point calculation is converted to integer calculation
+   * @param latitude Latitude, the actual longitude and latitude are processed by * coefficient,
+   * and the floating-point calculation is converted to integer calculation
+   * @param oriLatitude the latitude of origin point,which is used to calculate the deltaX and cut
+   * level.
+   * @param gridSize the size of minimal grid after cut
+   * @return Grid ID value [row, column], column starts from 1
+   */
+  public static int[] lonLat2ColRow(long longitude, long latitude, double oriLatitude,
+      int gridSize) {
+    int cutLevel = getCutCount(gridSize, oriLatitude);
+    int column = (int) Math.floor(longitude / getDeltaX(oriLatitude, gridSize) /
+        GeoConstants.CONVERSION_RATIO) + (1 << (cutLevel - 1));
+    int row = (int) Math.floor(latitude / getDeltaY(gridSize) /
+        GeoConstants.CONVERSION_RATIO) + (1 << (cutLevel - 1));
+    return new int[] {row, column};
+  }
+
+  /**
+   * Calculate the corresponding GeoId value from the grid coordinates
+   *
+   * @param row Gridded row index
+   * @param column Gridded column index
+   * @return hash id
+   */
+  public static long colRow2GeoID(int row, int column) {
+    long geoID = 0L;
+    int bit = 0;
+    long sourceRow = (long) row;
+    long sourceColumn = (long)column;
+    while (sourceRow > 0 || sourceColumn > 0) {
+      geoID = geoID | ((sourceRow & 1) << (2 * bit + 1)) | ((sourceColumn & 1) << 2 * bit);
+      sourceRow >>= 1;
+      sourceColumn >>= 1;
+      bit++;
+    }
+    return geoID;
+  }
+
+  /**
+   * Convert input GeoID to longitude and latitude
+   *
+   * @param geoId GeoID
+   * @param oriLatitude the origin point latitude
+   * @param gridSize the grid size
+   * @return Longitude and latitude of grid center point
+   */
+  public static double[] geoID2LngLat(long geoId, double oriLatitude, int gridSize) {
+    int[] rowCol = geoID2ColRow(geoId);
+    int column = rowCol[1];
+    int row = rowCol[0];
+    int cutLevel = getCutCount(gridSize, oriLatitude);
+    double deltaX = getDeltaX(oriLatitude, gridSize);
+    double deltaY = getDeltaY(gridSize);
+    double longitude = (column - (1 << (cutLevel - 1)) + 0.5) * deltaX;
+    double latitude = (row - (1 << (cutLevel - 1)) + 0.5) * deltaY;
+    longitude = new BigDecimal(longitude).setScale(GeoConstants.SCALE_OF_LONGITUDE_AND_LATITUDE,
+        BigDecimal.ROUND_HALF_UP).doubleValue();
+    latitude = new BigDecimal(latitude).setScale(GeoConstants.SCALE_OF_LONGITUDE_AND_LATITUDE,
+        BigDecimal.ROUND_HALF_UP).doubleValue();
+    return new double[]{longitude, latitude};
+  }
+
+  /**
+   * Convert input GeoID to grid column and row
+   *
+   * @param geoId GeoID
+   * @return grid column index and row index
+   */
+  public static int[] geoID2ColRow(long geoId) {
+    int row = 0;
+    int column = 0;
+    int bit = 0;
+    long source = geoId;
+    while (source > 0) {
+      column |= (source & 1) << bit;
+      source >>= 1;
+      row |= (source & 1) << bit;
+      source >>= 1;
+      bit++;
+    }
+    return new int[] {row, column};
+  }
+
+  /**
+   * Convert input string polygon to GeoID range list
+   *
+   * @param polygon input polygon string
+   * @param oriLatitude the origin point latitude
+   * @param gridSize the grid size
+   * @return GeoID range list of the polygon
+   */
+  public static List<Long[]> getRangeList(String polygon, double oriLatitude, int gridSize) {
+    List<double[]> queryPointList = getPointListFromPolygon(polygon);
+    int cutLevel = getCutCount(gridSize, oriLatitude);
+    double deltaY = getDeltaY(gridSize);
+    double maxLatitudeOfInitialArea = deltaY * Math.pow(2, cutLevel - 1);
+    double mCos = Math.cos(oriLatitude * Math.PI / GeoConstants.CONVERT_FACTOR);
+    double maxLongitudeOfInitialArea = maxLatitudeOfInitialArea / mCos;
+    double minLatitudeOfInitialArea = -maxLatitudeOfInitialArea;
+    double minLongitudeOfInitialArea = -maxLongitudeOfInitialArea;
+    QuadTreeCls qTreee = new QuadTreeCls(minLongitudeOfInitialArea, minLatitudeOfInitialArea,
+        maxLongitudeOfInitialArea, maxLatitudeOfInitialArea, cutLevel);
+    qTreee.insert(queryPointList);
+    return qTreee.getNodesData();
+  }
+
+  /**
+   * Convert input GeoID to upper layer GeoID of pyramid
+   *
+   * @param geoId GeoID
+   * @return the upper layer GeoID
+   */
+  public static long convertToUpperLayerGeoId(long geoId) {
+    return geoId >> 2;
+  }
+
+  /**
+   * Parse input polygon string to point list
+   *
+   * @param polygon input polygon string, example: POLYGON (35 10, 45 45, 15 40, 10 20, 35 10)
+   * @return the point list
+   */
+  public static List<double[]> getPointListFromPolygon(String polygon) {
+    String[] pointStringList = polygon.trim().split(GeoConstants.DEFAULT_DELIMITER);
+    if (4 > pointStringList.length) {
+      throw new RuntimeException(
+          "polygon need at least 3 points, really has " + pointStringList.length);
+    }
+    List<double[]> queryList = new ArrayList<>();
+    for (String pointString : pointStringList) {
+      String[] point = splitStringToPoint(pointString);
+      if (2 != point.length) {
+        throw new RuntimeException("longitude and latitude is a pair need 2 data");
+      }
+      try {
+        queryList.add(new double[] {Double.valueOf(point[0]), Double.valueOf(point[1])});
+      } catch (NumberFormatException e) {
+        throw new RuntimeException("can not covert the string data to double", e);
+      }
+    }
+    if (!checkPointsSame(pointStringList[0], pointStringList[pointStringList.length - 1])) {
+      throw new RuntimeException("the first point and last point in polygon should be same");
+    } else {
+      return queryList;
+    }
+  }
+
+  private static boolean checkPointsSame(String point1, String point2) {
+    String[] points1 = splitStringToPoint(point1);
+    String[] points2 = splitStringToPoint(point2);
+    return points1[0].equals(points2[0]) && points1[1].equals(points2[1]);
+  }
+
+  public static String[] splitStringToPoint(String str) {
+    return str.trim().split("\\s+");
+  }
+
+  public static void validateRangeList(List<Long[]> ranges) {
+    for (Long[] range : ranges) {
+      if (range.length != 2) {
+        throw new RuntimeException("Query processor must return list of ranges with each range "
+            + "containing minimum and maximum values");
+      }
+    }
+  }
+
+  /**
+   * Get two polygon's union and intersection
+   *
+   * @param rangeListA geoId range list of polygonA
+   * @param rangeListB geoId range list of polygonB
+   * @return geoId range list of processed set
+   */
+  public static List<Long[]> processRangeList(List<Long[]> rangeListA, List<Long[]> rangeListB,
+      String opType) {
+    List<Long[]> processedRangeList;
+    GeoOperationType operationType = GeoOperationType.getEnum(opType);
+    if (operationType == null) {
+      throw new RuntimeException("Unsupported operation type " + opType);
+    }
+    switch (operationType) {
+      case OR:
+        processedRangeList = getPolygonUnion(rangeListA, rangeListB);
+        break;
+      case AND:
+        processedRangeList = getPolygonIntersection(rangeListA, rangeListB);
+        break;
+      default:
+        throw new RuntimeException("Unsupported operation type " + opType);
+    }
+    return processedRangeList;
+  }
+
+  /**
+   * Get two polygon's union
+   *
+   * @param rangeListA geoId range list of polygonA
+   * @param rangeListB geoId range list of polygonB
+   * @return geoId range list after union
+   */
+  private static List<Long[]> getPolygonUnion(List<Long[]> rangeListA, List<Long[]> rangeListB) {
+    if (Objects.isNull(rangeListA)) {
+      return rangeListB;
+    }
+    if (Objects.isNull(rangeListB)) {
+      return rangeListA;
+    }
+    int sizeFirst = rangeListA.size();
+    int sizeSecond = rangeListB.size();
+    if (sizeFirst > sizeSecond) {
+      rangeListA.addAll(sizeFirst, rangeListB);
+      return mergeList(rangeListA);
+    } else {
+      rangeListB.addAll(sizeSecond, rangeListA);
+      return mergeList(rangeListB);
+    }
+  }
+
+  private static List<Long[]> mergeList(List<Long[]> list) {
+    if (list.size() == 0) {
+      return list;
+    }
+    Collections.sort(list, new Comparator<Long[]>() {
+      @Override
+      public int compare(Long[] arr1, Long[] arr2) {
+        return Long.compare(arr1[0], arr2[0]);
+      }
+    });
+    Long[] min;
+    Long[] max;
+    for (int i = 0; i < list.size(); i++) {
+      min = list.get(i);
+      for (int j = i + 1; j < list.size(); j++) {
+        max = list.get(j);
+        if (min[1] + 1 >= max[0]) {

Review comment:
       this part is developed by Discovery Team. we think your advice is right, and I added  `break`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-737090938


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5015/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-737092372


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3259/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r534107071



##########
File path: geo/src/main/java/org/apache/carbondata/geo/GeoHashUtils.java
##########
@@ -0,0 +1,411 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo;
+
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Objects;
+
+public class GeoHashUtils {
+
+  /**
+   * Get the degree of each grid in the east-west direction.
+   *
+   * @param originLatitude the origin point latitude
+   * @param gridSize the grid size
+   * @return Delta X is the degree of each grid in the east-west direction
+   */
+  public static double getDeltaX(double originLatitude, int gridSize) {
+    double mCos = Math.cos(originLatitude * Math.PI / GeoConstants.CONVERT_FACTOR);
+    return (GeoConstants.CONVERT_FACTOR * gridSize) / (Math.PI * GeoConstants.EARTH_RADIUS * mCos);
+  }
+
+  /**
+   * Get the degree of each grid in the north-south direction.
+   *
+   * @param gridSize the grid size
+   * @return Delta Y is the degree of each grid in the north-south direction
+   */
+  public static double getDeltaY(int gridSize) {
+    return (GeoConstants.CONVERT_FACTOR * gridSize) / (Math.PI * GeoConstants.EARTH_RADIUS);
+  }
+
+  /**
+   * Calculate the number of knives cut
+   *
+   * @param gridSize the grid size
+   * @param originLatitude the origin point latitude
+   * @return The number of knives cut
+   */
+  public static int getCutCount(int gridSize, double originLatitude) {
+    double deltaX = getDeltaX(originLatitude, gridSize);
+    int countX = Double.valueOf(
+        Math.ceil(Math.log(2 * GeoConstants.CONVERT_FACTOR / deltaX) / Math.log(2))).intValue();
+    double deltaY = getDeltaY(gridSize);
+    int countY = Double.valueOf(
+        Math.ceil(Math.log(GeoConstants.CONVERT_FACTOR / deltaY) / Math.log(2))).intValue();
+    return Math.max(countX, countY);
+  }
+
+  /**
+   * Convert input longitude and latitude to GeoID
+   *
+   * @param longitude Longitude, the actual longitude and latitude are processed by * coefficient,
+   *                  and the floating-point calculation is converted to integer calculation
+   * @param latitude Latitude, the actual longitude and latitude are processed by * coefficient,
+   *                  and the floating-point calculation is converted to integer calculation.
+   * @param oriLatitude the origin point latitude
+   * @param gridSize the grid size
+   * @return GeoID
+   */
+  public static long lonLat2GeoID(long longitude, long latitude, double oriLatitude, int gridSize) {
+    long longtitudeByRatio = longitude * GeoConstants.CONVERSION_FACTOR_FOR_ACCURACY;
+    long latitudeByRatio = latitude * GeoConstants.CONVERSION_FACTOR_FOR_ACCURACY;
+    int[] ij = lonLat2ColRow(longtitudeByRatio, latitudeByRatio, oriLatitude, gridSize);
+    return colRow2GeoID(ij[0], ij[1]);
+  }
+
+  /**
+   * Calculate geo id through grid index coordinates, the row and column of grid coordinates
+   * can be transformed by latitude and longitude
+   *
+   * @param longitude Longitude, the actual longitude and latitude are processed by * coefficient,
+   * and the floating-point calculation is converted to integer calculation
+   * @param latitude Latitude, the actual longitude and latitude are processed by * coefficient,
+   * and the floating-point calculation is converted to integer calculation
+   * @param oriLatitude the latitude of origin point,which is used to calculate the deltaX and cut
+   * level.
+   * @param gridSize the size of minimal grid after cut
+   * @return Grid ID value [row, column], column starts from 1
+   */
+  public static int[] lonLat2ColRow(long longitude, long latitude, double oriLatitude,
+      int gridSize) {
+    int cutLevel = getCutCount(gridSize, oriLatitude);
+    int column = (int) Math.floor(longitude / getDeltaX(oriLatitude, gridSize) /
+        GeoConstants.CONVERSION_RATIO) + (1 << (cutLevel - 1));
+    int row = (int) Math.floor(latitude / getDeltaY(gridSize) /
+        GeoConstants.CONVERSION_RATIO) + (1 << (cutLevel - 1));
+    return new int[] {row, column};
+  }
+
+  /**
+   * Calculate the corresponding GeoId value from the grid coordinates
+   *
+   * @param row Gridded row index
+   * @param column Gridded column index
+   * @return hash id
+   */
+  public static long colRow2GeoID(int row, int column) {
+    long geoID = 0L;
+    int bit = 0;
+    long sourceRow = (long) row;
+    long sourceColumn = (long)column;
+    while (sourceRow > 0 || sourceColumn > 0) {
+      geoID = geoID | ((sourceRow & 1) << (2 * bit + 1)) | ((sourceColumn & 1) << 2 * bit);
+      sourceRow >>= 1;
+      sourceColumn >>= 1;
+      bit++;
+    }
+    return geoID;
+  }
+
+  /**
+   * Convert input GeoID to longitude and latitude
+   *
+   * @param geoId GeoID
+   * @param oriLatitude the origin point latitude
+   * @param gridSize the grid size
+   * @return Longitude and latitude of grid center point
+   */
+  public static double[] geoID2LngLat(long geoId, double oriLatitude, int gridSize) {
+    int[] rowCol = geoID2ColRow(geoId);
+    int column = rowCol[1];
+    int row = rowCol[0];
+    int cutLevel = getCutCount(gridSize, oriLatitude);
+    double deltaX = getDeltaX(oriLatitude, gridSize);
+    double deltaY = getDeltaY(gridSize);
+    double longitude = (column - (1 << (cutLevel - 1)) + 0.5) * deltaX;
+    double latitude = (row - (1 << (cutLevel - 1)) + 0.5) * deltaY;
+    longitude = new BigDecimal(longitude).setScale(GeoConstants.SCALE_OF_LONGITUDE_AND_LATITUDE,
+        BigDecimal.ROUND_HALF_UP).doubleValue();
+    latitude = new BigDecimal(latitude).setScale(GeoConstants.SCALE_OF_LONGITUDE_AND_LATITUDE,
+        BigDecimal.ROUND_HALF_UP).doubleValue();
+    return new double[]{longitude, latitude};
+  }
+
+  /**
+   * Convert input GeoID to grid column and row
+   *
+   * @param geoId GeoID
+   * @return grid column index and row index
+   */
+  public static int[] geoID2ColRow(long geoId) {
+    int row = 0;
+    int column = 0;
+    int bit = 0;
+    long source = geoId;
+    while (source > 0) {
+      column |= (source & 1) << bit;
+      source >>= 1;
+      row |= (source & 1) << bit;
+      source >>= 1;
+      bit++;
+    }
+    return new int[] {row, column};
+  }
+
+  /**
+   * Convert input string polygon to GeoID range list
+   *
+   * @param polygon input polygon string
+   * @param oriLatitude the origin point latitude
+   * @param gridSize the grid size
+   * @return GeoID range list of the polygon
+   */
+  public static List<Long[]> getRangeList(String polygon, double oriLatitude, int gridSize) {
+    List<double[]> queryPointList = getPointListFromPolygon(polygon);
+    int cutLevel = getCutCount(gridSize, oriLatitude);
+    double deltaY = getDeltaY(gridSize);
+    double maxLatitudeOfInitialArea = deltaY * Math.pow(2, cutLevel - 1);
+    double mCos = Math.cos(oriLatitude * Math.PI / GeoConstants.CONVERT_FACTOR);
+    double maxLongitudeOfInitialArea = maxLatitudeOfInitialArea / mCos;
+    double minLatitudeOfInitialArea = -maxLatitudeOfInitialArea;
+    double minLongitudeOfInitialArea = -maxLongitudeOfInitialArea;
+    QuadTreeCls qTreee = new QuadTreeCls(minLongitudeOfInitialArea, minLatitudeOfInitialArea,
+        maxLongitudeOfInitialArea, maxLatitudeOfInitialArea, cutLevel);
+    qTreee.insert(queryPointList);
+    return qTreee.getNodesData();
+  }
+
+  /**
+   * Convert input GeoID to upper layer GeoID of pyramid
+   *
+   * @param geoId GeoID
+   * @return the upper layer GeoID
+   */
+  public static long convertToUpperLayerGeoId(long geoId) {
+    return geoId >> 2;
+  }
+
+  /**
+   * Parse input polygon string to point list
+   *
+   * @param polygon input polygon string, example: POLYGON (35 10, 45 45, 15 40, 10 20, 35 10)
+   * @return the point list
+   */
+  public static List<double[]> getPointListFromPolygon(String polygon) {
+    String[] pointStringList = polygon.trim().split(GeoConstants.DEFAULT_DELIMITER);
+    if (4 > pointStringList.length) {
+      throw new RuntimeException(
+          "polygon need at least 3 points, really has " + pointStringList.length);
+    }
+    List<double[]> queryList = new ArrayList<>();
+    for (String pointString : pointStringList) {
+      String[] point = splitStringToPoint(pointString);
+      if (2 != point.length) {
+        throw new RuntimeException("longitude and latitude is a pair need 2 data");
+      }
+      try {
+        queryList.add(new double[] {Double.valueOf(point[0]), Double.valueOf(point[1])});
+      } catch (NumberFormatException e) {
+        throw new RuntimeException("can not covert the string data to double", e);
+      }
+    }
+    if (!checkPointsSame(pointStringList[0], pointStringList[pointStringList.length - 1])) {
+      throw new RuntimeException("the first point and last point in polygon should be same");
+    } else {
+      return queryList;
+    }
+  }
+
+  private static boolean checkPointsSame(String point1, String point2) {
+    String[] points1 = splitStringToPoint(point1);
+    String[] points2 = splitStringToPoint(point2);
+    return points1[0].equals(points2[0]) && points1[1].equals(points2[1]);
+  }
+
+  public static String[] splitStringToPoint(String str) {
+    return str.trim().split("\\s+");
+  }
+
+  public static void validateRangeList(List<Long[]> ranges) {
+    for (Long[] range : ranges) {
+      if (range.length != 2) {
+        throw new RuntimeException("Query processor must return list of ranges with each range "
+            + "containing minimum and maximum values");
+      }
+    }
+  }
+
+  /**
+   * Get two polygon's union and intersection
+   *
+   * @param rangeListA geoId range list of polygonA
+   * @param rangeListB geoId range list of polygonB
+   * @return geoId range list of processed set
+   */
+  public static List<Long[]> processRangeList(List<Long[]> rangeListA, List<Long[]> rangeListB,
+      String opType) {
+    List<Long[]> processedRangeList;
+    GeoOperationType operationType = GeoOperationType.getEnum(opType);
+    if (operationType == null) {
+      throw new RuntimeException("Unsupported operation type " + opType);
+    }
+    switch (operationType) {
+      case OR:
+        processedRangeList = getPolygonUnion(rangeListA, rangeListB);
+        break;
+      case AND:
+        processedRangeList = getPolygonIntersection(rangeListA, rangeListB);
+        break;
+      default:
+        throw new RuntimeException("Unsupported operation type " + opType);
+    }
+    return processedRangeList;
+  }
+
+  /**
+   * Get two polygon's union
+   *
+   * @param rangeListA geoId range list of polygonA
+   * @param rangeListB geoId range list of polygonB
+   * @return geoId range list after union
+   */
+  private static List<Long[]> getPolygonUnion(List<Long[]> rangeListA, List<Long[]> rangeListB) {
+    if (Objects.isNull(rangeListA)) {
+      return rangeListB;
+    }
+    if (Objects.isNull(rangeListB)) {
+      return rangeListA;
+    }
+    int sizeFirst = rangeListA.size();
+    int sizeSecond = rangeListB.size();
+    if (sizeFirst > sizeSecond) {
+      rangeListA.addAll(sizeFirst, rangeListB);
+      return mergeList(rangeListA);
+    } else {
+      rangeListB.addAll(sizeSecond, rangeListA);
+      return mergeList(rangeListB);
+    }
+  }
+
+  private static List<Long[]> mergeList(List<Long[]> list) {
+    if (list.size() == 0) {
+      return list;
+    }
+    Collections.sort(list, new Comparator<Long[]>() {
+      @Override
+      public int compare(Long[] arr1, Long[] arr2) {
+        return Long.compare(arr1[0], arr2[0]);
+      }
+    });
+    Long[] min;
+    Long[] max;
+    for (int i = 0; i < list.size(); i++) {
+      min = list.get(i);
+      for (int j = i + 1; j < list.size(); j++) {
+        max = list.get(j);
+        if (min[1] + 1 >= max[0]) {
+          min[1] = Math.max(max[1], min[1]);
+          list.remove(j);
+          j--;
+        }
+      }
+    }
+    return list;
+  }
+
+  /**
+   * Get two polygon's intersection
+   *
+   * @param rangeListA geoId range list of polygonA
+   * @param rangeListB geoId range list of polygonB
+   * @return geoId range list after intersection
+   */
+  private static List<Long[]> getPolygonIntersection(List<Long[]> rangeListA,
+      List<Long[]> rangeListB) {
+    List<Long[]> intersectionList = new ArrayList<>();
+    if (Objects.isNull(rangeListA) || Objects.isNull(rangeListB)) {
+      return Collections.emptyList();
+    }
+    int endIndex1 = rangeListA.size();
+    int endIndex2 = rangeListB.size();
+    int startIndex1 = 0;
+    int startIndex2 = 0;
+
+    while (startIndex1 < endIndex1 && startIndex2 < endIndex2) {

Review comment:
       this part code is developed by Discovery. `if` block code is no intersection part, `else `block code is intersection part, they think this `if else` may be more faster in some case, how do you think about it ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r534240300



##########
File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonRangeListExpression.java
##########
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashUtils;
+import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+/**
+ * InPolygonRangeList expression processor. It inputs the InPolygonRangeList string to
+ * the Geo implementation's query method, inputs lists of range of IDs and is to be calculated
+ * the and/or/diff range list to filter. And then, build InExpression with list of all the IDs
+ * present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolygonRangeListExpression extends UnknownExpression
+    implements ConditionalExpression {
+
+  private String polygonRangeList;
+
+  private String opType;
+
+  private List<Long[]> ranges = new ArrayList<Long[]>();
+
+  private ColumnExpression column;
+
+  private static final ExpressionResult trueExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, true);
+
+  private static final ExpressionResult falseExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, false);
+
+  public PolygonRangeListExpression(String polygonRangeList, String opType, String columnName) {
+    this.polygonRangeList = polygonRangeList;
+    this.opType = opType;
+    this.column = new ColumnExpression(columnName, DataTypes.LONG);
+  }
+
+  private void processExpression() {
+    try {
+      // 1. parse the range list string
+      List<String> rangeLists = new ArrayList<>();
+      Pattern pattern = Pattern.compile(GeoConstants.RANGELIST_REG_EXPRESSION);
+      Matcher matcher = pattern.matcher(polygonRangeList);
+      while (matcher.find()) {
+        String matchedStr = matcher.group();
+        rangeLists.add(matchedStr);
+      }
+      // 2. process the range lists
+      if (rangeLists.size() > 0) {
+        List<Long[]> processedRangeList = getRangeListFromString(rangeLists.get(0));
+        for (int i = 1; i < rangeLists.size(); i++) {
+          List<Long[]> tempRangeList = getRangeListFromString(rangeLists.get(i));
+          processedRangeList = GeoHashUtils.processRangeList(
+            processedRangeList, tempRangeList, opType);
+        }
+        ranges = processedRangeList;
+        GeoHashUtils.validateRangeList(ranges);
+      }
+    } catch (Exception e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  private void sortRange(List<Long[]> rangeList) {
+    rangeList.sort(new Comparator<Long[]>() {
+      @Override
+      public int compare(Long[] x, Long[] y) {
+        return Long.compare(x[0], y[0]);
+      }
+    });
+  }
+
+  private void combineRange(List<Long[]> rangeList) {
+    if (rangeList.size() > 1) {
+      for (int i = 0, j = i + 1; i < rangeList.size() - 1; i++, j++) {
+        long previousEnd = rangeList.get(i)[1];
+        long nextStart = rangeList.get(j)[0];
+        if (previousEnd + 1 == nextStart) {

Review comment:
       Corrected it.  For overlapping case [[2,8], [5,10]] and containing case [[2,8], [5,6]]




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r534240585



##########
File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonRangeListExpression.java
##########
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashUtils;
+import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+/**
+ * InPolygonRangeList expression processor. It inputs the InPolygonRangeList string to
+ * the Geo implementation's query method, inputs lists of range of IDs and is to be calculated
+ * the and/or/diff range list to filter. And then, build InExpression with list of all the IDs
+ * present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolygonRangeListExpression extends UnknownExpression
+    implements ConditionalExpression {
+
+  private String polygonRangeList;
+
+  private String opType;
+
+  private List<Long[]> ranges = new ArrayList<Long[]>();
+
+  private ColumnExpression column;
+
+  private static final ExpressionResult trueExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, true);
+
+  private static final ExpressionResult falseExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, false);
+
+  public PolygonRangeListExpression(String polygonRangeList, String opType, String columnName) {
+    this.polygonRangeList = polygonRangeList;
+    this.opType = opType;
+    this.column = new ColumnExpression(columnName, DataTypes.LONG);
+  }
+
+  private void processExpression() {
+    try {
+      // 1. parse the range list string
+      List<String> rangeLists = new ArrayList<>();
+      Pattern pattern = Pattern.compile(GeoConstants.RANGELIST_REG_EXPRESSION);
+      Matcher matcher = pattern.matcher(polygonRangeList);
+      while (matcher.find()) {
+        String matchedStr = matcher.group();
+        rangeLists.add(matchedStr);
+      }
+      // 2. process the range lists
+      if (rangeLists.size() > 0) {
+        List<Long[]> processedRangeList = getRangeListFromString(rangeLists.get(0));
+        for (int i = 1; i < rangeLists.size(); i++) {
+          List<Long[]> tempRangeList = getRangeListFromString(rangeLists.get(i));
+          processedRangeList = GeoHashUtils.processRangeList(
+            processedRangeList, tempRangeList, opType);
+        }
+        ranges = processedRangeList;
+        GeoHashUtils.validateRangeList(ranges);
+      }
+    } catch (Exception e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  private void sortRange(List<Long[]> rangeList) {
+    rangeList.sort(new Comparator<Long[]>() {
+      @Override
+      public int compare(Long[] x, Long[] y) {
+        return Long.compare(x[0], y[0]);
+      }
+    });
+  }
+
+  private void combineRange(List<Long[]> rangeList) {
+    if (rangeList.size() > 1) {
+      for (int i = 0, j = i + 1; i < rangeList.size() - 1; i++, j++) {
+        long previousEnd = rangeList.get(i)[1];
+        long nextStart = rangeList.get(j)[0];
+        if (previousEnd + 1 == nextStart) {
+          rangeList.get(j)[0] = rangeList.get(i)[0];
+          rangeList.get(i)[0] = null;
+          rangeList.get(i)[1] = null;
+        }
+      }
+      rangeList.removeIf(item -> item[0] == null && item[1] == null);
+    }
+  }
+
+  private List<Long[]> getRangeListFromString(String rangeListString) {
+    String[] rangeStringList = rangeListString.trim().split(GeoConstants.DEFAULT_DELIMITER);
+    List<Long[]> rangeList = new ArrayList<>();
+    for (String rangeString : rangeStringList) {
+      String[] range = GeoHashUtils.splitStringToPoint(rangeString);
+      if (range.length != 2) {
+        throw new RuntimeException("each range is a pair need 2 data");
+      }
+      try {
+        rangeList.add(new Long[] {Long.valueOf(range[0]), Long.valueOf(range[1])});

Review comment:
       Added this checking




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-737354631


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5033/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-737358091


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3276/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r535124613



##########
File path: geo/src/main/java/org/apache/carbondata/geo/GeoHashUtils.java
##########
@@ -0,0 +1,411 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo;
+
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Objects;
+
+public class GeoHashUtils {
+
+  /**
+   * Get the degree of each grid in the east-west direction.
+   *
+   * @param originLatitude the origin point latitude
+   * @param gridSize the grid size
+   * @return Delta X is the degree of each grid in the east-west direction
+   */
+  public static double getDeltaX(double originLatitude, int gridSize) {
+    double mCos = Math.cos(originLatitude * Math.PI / GeoConstants.CONVERT_FACTOR);
+    return (GeoConstants.CONVERT_FACTOR * gridSize) / (Math.PI * GeoConstants.EARTH_RADIUS * mCos);
+  }
+
+  /**
+   * Get the degree of each grid in the north-south direction.
+   *
+   * @param gridSize the grid size
+   * @return Delta Y is the degree of each grid in the north-south direction
+   */
+  public static double getDeltaY(int gridSize) {
+    return (GeoConstants.CONVERT_FACTOR * gridSize) / (Math.PI * GeoConstants.EARTH_RADIUS);
+  }
+
+  /**
+   * Calculate the number of knives cut
+   *
+   * @param gridSize the grid size
+   * @param originLatitude the origin point latitude
+   * @return The number of knives cut
+   */
+  public static int getCutCount(int gridSize, double originLatitude) {
+    double deltaX = getDeltaX(originLatitude, gridSize);
+    int countX = Double.valueOf(
+        Math.ceil(Math.log(2 * GeoConstants.CONVERT_FACTOR / deltaX) / Math.log(2))).intValue();
+    double deltaY = getDeltaY(gridSize);
+    int countY = Double.valueOf(
+        Math.ceil(Math.log(GeoConstants.CONVERT_FACTOR / deltaY) / Math.log(2))).intValue();
+    return Math.max(countX, countY);
+  }
+
+  /**
+   * Convert input longitude and latitude to GeoID
+   *
+   * @param longitude Longitude, the actual longitude and latitude are processed by * coefficient,
+   *                  and the floating-point calculation is converted to integer calculation
+   * @param latitude Latitude, the actual longitude and latitude are processed by * coefficient,
+   *                  and the floating-point calculation is converted to integer calculation.
+   * @param oriLatitude the origin point latitude
+   * @param gridSize the grid size
+   * @return GeoID
+   */
+  public static long lonLat2GeoID(long longitude, long latitude, double oriLatitude, int gridSize) {
+    long longtitudeByRatio = longitude * GeoConstants.CONVERSION_FACTOR_FOR_ACCURACY;
+    long latitudeByRatio = latitude * GeoConstants.CONVERSION_FACTOR_FOR_ACCURACY;
+    int[] ij = lonLat2ColRow(longtitudeByRatio, latitudeByRatio, oriLatitude, gridSize);
+    return colRow2GeoID(ij[0], ij[1]);
+  }
+
+  /**
+   * Calculate geo id through grid index coordinates, the row and column of grid coordinates
+   * can be transformed by latitude and longitude
+   *
+   * @param longitude Longitude, the actual longitude and latitude are processed by * coefficient,
+   * and the floating-point calculation is converted to integer calculation
+   * @param latitude Latitude, the actual longitude and latitude are processed by * coefficient,
+   * and the floating-point calculation is converted to integer calculation
+   * @param oriLatitude the latitude of origin point,which is used to calculate the deltaX and cut
+   * level.
+   * @param gridSize the size of minimal grid after cut
+   * @return Grid ID value [row, column], column starts from 1
+   */
+  public static int[] lonLat2ColRow(long longitude, long latitude, double oriLatitude,
+      int gridSize) {
+    int cutLevel = getCutCount(gridSize, oriLatitude);
+    int column = (int) Math.floor(longitude / getDeltaX(oriLatitude, gridSize) /
+        GeoConstants.CONVERSION_RATIO) + (1 << (cutLevel - 1));
+    int row = (int) Math.floor(latitude / getDeltaY(gridSize) /
+        GeoConstants.CONVERSION_RATIO) + (1 << (cutLevel - 1));
+    return new int[] {row, column};
+  }
+
+  /**
+   * Calculate the corresponding GeoId value from the grid coordinates
+   *
+   * @param row Gridded row index
+   * @param column Gridded column index
+   * @return hash id
+   */
+  public static long colRow2GeoID(int row, int column) {
+    long geoID = 0L;
+    int bit = 0;
+    long sourceRow = (long) row;
+    long sourceColumn = (long)column;
+    while (sourceRow > 0 || sourceColumn > 0) {
+      geoID = geoID | ((sourceRow & 1) << (2 * bit + 1)) | ((sourceColumn & 1) << 2 * bit);
+      sourceRow >>= 1;
+      sourceColumn >>= 1;
+      bit++;
+    }
+    return geoID;
+  }
+
+  /**
+   * Convert input GeoID to longitude and latitude
+   *
+   * @param geoId GeoID
+   * @param oriLatitude the origin point latitude
+   * @param gridSize the grid size
+   * @return Longitude and latitude of grid center point
+   */
+  public static double[] geoID2LngLat(long geoId, double oriLatitude, int gridSize) {
+    int[] rowCol = geoID2ColRow(geoId);
+    int column = rowCol[1];
+    int row = rowCol[0];
+    int cutLevel = getCutCount(gridSize, oriLatitude);
+    double deltaX = getDeltaX(oriLatitude, gridSize);
+    double deltaY = getDeltaY(gridSize);
+    double longitude = (column - (1 << (cutLevel - 1)) + 0.5) * deltaX;
+    double latitude = (row - (1 << (cutLevel - 1)) + 0.5) * deltaY;
+    longitude = new BigDecimal(longitude).setScale(GeoConstants.SCALE_OF_LONGITUDE_AND_LATITUDE,
+        BigDecimal.ROUND_HALF_UP).doubleValue();
+    latitude = new BigDecimal(latitude).setScale(GeoConstants.SCALE_OF_LONGITUDE_AND_LATITUDE,
+        BigDecimal.ROUND_HALF_UP).doubleValue();
+    return new double[]{longitude, latitude};
+  }
+
+  /**
+   * Convert input GeoID to grid column and row
+   *
+   * @param geoId GeoID
+   * @return grid column index and row index
+   */
+  public static int[] geoID2ColRow(long geoId) {
+    int row = 0;
+    int column = 0;
+    int bit = 0;
+    long source = geoId;
+    while (source > 0) {
+      column |= (source & 1) << bit;
+      source >>= 1;
+      row |= (source & 1) << bit;
+      source >>= 1;
+      bit++;
+    }
+    return new int[] {row, column};
+  }
+
+  /**
+   * Convert input string polygon to GeoID range list
+   *
+   * @param polygon input polygon string
+   * @param oriLatitude the origin point latitude
+   * @param gridSize the grid size
+   * @return GeoID range list of the polygon
+   */
+  public static List<Long[]> getRangeList(String polygon, double oriLatitude, int gridSize) {
+    List<double[]> queryPointList = getPointListFromPolygon(polygon);
+    int cutLevel = getCutCount(gridSize, oriLatitude);
+    double deltaY = getDeltaY(gridSize);
+    double maxLatitudeOfInitialArea = deltaY * Math.pow(2, cutLevel - 1);
+    double mCos = Math.cos(oriLatitude * Math.PI / GeoConstants.CONVERT_FACTOR);
+    double maxLongitudeOfInitialArea = maxLatitudeOfInitialArea / mCos;
+    double minLatitudeOfInitialArea = -maxLatitudeOfInitialArea;
+    double minLongitudeOfInitialArea = -maxLongitudeOfInitialArea;
+    QuadTreeCls qTreee = new QuadTreeCls(minLongitudeOfInitialArea, minLatitudeOfInitialArea,
+        maxLongitudeOfInitialArea, maxLatitudeOfInitialArea, cutLevel);
+    qTreee.insert(queryPointList);
+    return qTreee.getNodesData();
+  }
+
+  /**
+   * Convert input GeoID to upper layer GeoID of pyramid
+   *
+   * @param geoId GeoID
+   * @return the upper layer GeoID
+   */
+  public static long convertToUpperLayerGeoId(long geoId) {
+    return geoId >> 2;
+  }
+
+  /**
+   * Parse input polygon string to point list
+   *
+   * @param polygon input polygon string, example: POLYGON (35 10, 45 45, 15 40, 10 20, 35 10)
+   * @return the point list
+   */
+  public static List<double[]> getPointListFromPolygon(String polygon) {
+    String[] pointStringList = polygon.trim().split(GeoConstants.DEFAULT_DELIMITER);
+    if (4 > pointStringList.length) {
+      throw new RuntimeException(
+          "polygon need at least 3 points, really has " + pointStringList.length);
+    }
+    List<double[]> queryList = new ArrayList<>();
+    for (String pointString : pointStringList) {
+      String[] point = splitStringToPoint(pointString);
+      if (2 != point.length) {
+        throw new RuntimeException("longitude and latitude is a pair need 2 data");
+      }
+      try {
+        queryList.add(new double[] {Double.valueOf(point[0]), Double.valueOf(point[1])});
+      } catch (NumberFormatException e) {
+        throw new RuntimeException("can not covert the string data to double", e);
+      }
+    }
+    if (!checkPointsSame(pointStringList[0], pointStringList[pointStringList.length - 1])) {
+      throw new RuntimeException("the first point and last point in polygon should be same");
+    } else {
+      return queryList;
+    }
+  }
+
+  private static boolean checkPointsSame(String point1, String point2) {
+    String[] points1 = splitStringToPoint(point1);
+    String[] points2 = splitStringToPoint(point2);
+    return points1[0].equals(points2[0]) && points1[1].equals(points2[1]);
+  }
+
+  public static String[] splitStringToPoint(String str) {
+    return str.trim().split("\\s+");
+  }
+
+  public static void validateRangeList(List<Long[]> ranges) {
+    for (Long[] range : ranges) {
+      if (range.length != 2) {
+        throw new RuntimeException("Query processor must return list of ranges with each range "
+            + "containing minimum and maximum values");
+      }
+    }
+  }
+
+  /**
+   * Get two polygon's union and intersection
+   *
+   * @param rangeListA geoId range list of polygonA
+   * @param rangeListB geoId range list of polygonB
+   * @return geoId range list of processed set
+   */
+  public static List<Long[]> processRangeList(List<Long[]> rangeListA, List<Long[]> rangeListB,
+      String opType) {
+    List<Long[]> processedRangeList;
+    GeoOperationType operationType = GeoOperationType.getEnum(opType);
+    if (operationType == null) {
+      throw new RuntimeException("Unsupported operation type " + opType);
+    }
+    switch (operationType) {
+      case OR:
+        processedRangeList = getPolygonUnion(rangeListA, rangeListB);
+        break;
+      case AND:
+        processedRangeList = getPolygonIntersection(rangeListA, rangeListB);
+        break;
+      default:
+        throw new RuntimeException("Unsupported operation type " + opType);
+    }
+    return processedRangeList;
+  }
+
+  /**
+   * Get two polygon's union
+   *
+   * @param rangeListA geoId range list of polygonA
+   * @param rangeListB geoId range list of polygonB
+   * @return geoId range list after union
+   */
+  private static List<Long[]> getPolygonUnion(List<Long[]> rangeListA, List<Long[]> rangeListB) {
+    if (Objects.isNull(rangeListA)) {
+      return rangeListB;
+    }
+    if (Objects.isNull(rangeListB)) {
+      return rangeListA;
+    }
+    int sizeFirst = rangeListA.size();
+    int sizeSecond = rangeListB.size();
+    if (sizeFirst > sizeSecond) {
+      rangeListA.addAll(sizeFirst, rangeListB);
+      return mergeList(rangeListA);
+    } else {
+      rangeListB.addAll(sizeSecond, rangeListA);
+      return mergeList(rangeListB);
+    }
+  }
+
+  private static List<Long[]> mergeList(List<Long[]> list) {
+    if (list.size() == 0) {
+      return list;
+    }
+    Collections.sort(list, new Comparator<Long[]>() {
+      @Override
+      public int compare(Long[] arr1, Long[] arr2) {
+        return Long.compare(arr1[0], arr2[0]);
+      }
+    });
+    Long[] min;
+    Long[] max;
+    for (int i = 0; i < list.size(); i++) {
+      min = list.get(i);
+      for (int j = i + 1; j < list.size(); j++) {
+        max = list.get(j);
+        if (min[1] + 1 >= max[0]) {
+          min[1] = Math.max(max[1], min[1]);
+          list.remove(j);
+          j--;
+        }
+      }
+    }
+    return list;
+  }
+
+  /**
+   * Get two polygon's intersection
+   *
+   * @param rangeListA geoId range list of polygonA
+   * @param rangeListB geoId range list of polygonB
+   * @return geoId range list after intersection
+   */
+  private static List<Long[]> getPolygonIntersection(List<Long[]> rangeListA,
+      List<Long[]> rangeListB) {
+    List<Long[]> intersectionList = new ArrayList<>();
+    if (Objects.isNull(rangeListA) || Objects.isNull(rangeListB)) {
+      return Collections.emptyList();
+    }
+    int endIndex1 = rangeListA.size();
+    int endIndex2 = rangeListB.size();
+    int startIndex1 = 0;
+    int startIndex2 = 0;
+
+    while (startIndex1 < endIndex1 && startIndex2 < endIndex2) {

Review comment:
       I have optimized the code with your suggestion




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-737968960


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5043/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-737977578


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3285/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-738483684


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-738527880


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5055/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-738528336


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3296/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-739480249






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-739609378


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533974578



##########
File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolylineListExpression.java
##########
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.core.util.CustomIndex;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashIndex;
+import org.apache.carbondata.geo.GeoHashUtils;
+import org.apache.carbondata.geo.GeoOperationType;
+import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+import org.locationtech.jts.geom.Coordinate;
+import org.locationtech.jts.geom.Geometry;
+import org.locationtech.jts.geom.GeometryFactory;
+import org.locationtech.jts.geom.LineString;
+import org.locationtech.jts.geom.Polygon;
+import org.locationtech.jts.io.WKTReader;
+import org.locationtech.jts.operation.buffer.BufferParameters;
+
+/**
+ * InPolylineList expression processor. It inputs the InPolylineList string to the Geo
+ * implementation's query method, gets a list of range of IDs from each polygon and
+ * calculates the and/or/diff range list to filter as an output. And then, build
+ * InExpression with list of all the IDs present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolylineListExpression extends UnknownExpression
+    implements ConditionalExpression {
+
+  private static final GeometryFactory geoFactory = new GeometryFactory();
+
+  private String polylineString;
+
+  private Float bufferInMeter;
+
+  private GeoHashIndex instance;
+
+  private List<Long[]> ranges = new ArrayList<Long[]>();
+
+  private ColumnExpression column;
+
+  private static final ExpressionResult trueExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, true);
+
+  private static final ExpressionResult falseExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, false);
+
+  public PolylineListExpression(String polylineString, Float bufferInMeter, String columnName,
+      CustomIndex indexInstance) {
+    this.polylineString = polylineString;
+    this.bufferInMeter = bufferInMeter;
+    this.instance = (GeoHashIndex) indexInstance;
+    this.column = new ColumnExpression(columnName, DataTypes.LONG);
+  }
+
+  private void processExpression() {
+    try {
+      // transform the distance unit meter to degree
+      double buffer = bufferInMeter / GeoConstants.CONVERSION_FACTOR_OF_METER_TO_DEGREE;
+
+      // 1. parse the polyline list string and get polygon from each polyline
+      List<Geometry> polygonList = new ArrayList<>();
+      WKTReader wktReader = new WKTReader();
+      Pattern pattern = Pattern.compile(GeoConstants.POLYLINE_REG_EXPRESSION);
+      Matcher matcher = pattern.matcher(polylineString);
+      while (matcher.find()) {
+        String matchedStr = matcher.group();
+        LineString polylineCreatedFromStr = (LineString) wktReader.read(matchedStr);
+        Polygon polygonFromPolylineBuffer = (Polygon) polylineCreatedFromStr.buffer(
+            buffer, 0, BufferParameters.CAP_SQUARE);
+        polygonList.add(polygonFromPolylineBuffer);
+      }
+      // 2. get the range list of each polygon
+      if (polygonList.size() > 0) {

Review comment:
        `IN_POLYLINE_LIST('LINESTRING (120.199242 30.324464, 120.190359 30.315388)', 65)`
   Actually, reg expression matches the part `LINESTRING (120.199242 30.324464, 120.190359 30.315388)` and gets the string in `()`, UDF can receive one or more LINESTRING in first parameter of IN_POLYLINE_LIST.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533974578



##########
File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolylineListExpression.java
##########
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.core.util.CustomIndex;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashIndex;
+import org.apache.carbondata.geo.GeoHashUtils;
+import org.apache.carbondata.geo.GeoOperationType;
+import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+import org.locationtech.jts.geom.Coordinate;
+import org.locationtech.jts.geom.Geometry;
+import org.locationtech.jts.geom.GeometryFactory;
+import org.locationtech.jts.geom.LineString;
+import org.locationtech.jts.geom.Polygon;
+import org.locationtech.jts.io.WKTReader;
+import org.locationtech.jts.operation.buffer.BufferParameters;
+
+/**
+ * InPolylineList expression processor. It inputs the InPolylineList string to the Geo
+ * implementation's query method, gets a list of range of IDs from each polygon and
+ * calculates the and/or/diff range list to filter as an output. And then, build
+ * InExpression with list of all the IDs present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolylineListExpression extends UnknownExpression
+    implements ConditionalExpression {
+
+  private static final GeometryFactory geoFactory = new GeometryFactory();
+
+  private String polylineString;
+
+  private Float bufferInMeter;
+
+  private GeoHashIndex instance;
+
+  private List<Long[]> ranges = new ArrayList<Long[]>();
+
+  private ColumnExpression column;
+
+  private static final ExpressionResult trueExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, true);
+
+  private static final ExpressionResult falseExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, false);
+
+  public PolylineListExpression(String polylineString, Float bufferInMeter, String columnName,
+      CustomIndex indexInstance) {
+    this.polylineString = polylineString;
+    this.bufferInMeter = bufferInMeter;
+    this.instance = (GeoHashIndex) indexInstance;
+    this.column = new ColumnExpression(columnName, DataTypes.LONG);
+  }
+
+  private void processExpression() {
+    try {
+      // transform the distance unit meter to degree
+      double buffer = bufferInMeter / GeoConstants.CONVERSION_FACTOR_OF_METER_TO_DEGREE;
+
+      // 1. parse the polyline list string and get polygon from each polyline
+      List<Geometry> polygonList = new ArrayList<>();
+      WKTReader wktReader = new WKTReader();
+      Pattern pattern = Pattern.compile(GeoConstants.POLYLINE_REG_EXPRESSION);
+      Matcher matcher = pattern.matcher(polylineString);
+      while (matcher.find()) {
+        String matchedStr = matcher.group();
+        LineString polylineCreatedFromStr = (LineString) wktReader.read(matchedStr);
+        Polygon polygonFromPolylineBuffer = (Polygon) polylineCreatedFromStr.buffer(
+            buffer, 0, BufferParameters.CAP_SQUARE);
+        polygonList.add(polygonFromPolylineBuffer);
+      }
+      // 2. get the range list of each polygon
+      if (polygonList.size() > 0) {

Review comment:
        `IN_POLYLINE_LIST('LINESTRING (120.199242 30.324464, 120.190359 30.315388)', 65)`
   Actually, reg expression is to match the part `LINESTRING (120.199242 30.324464, 120.190359 30.315388)` and get the string in `()`, UDF can receive one or more LINESTRING in first parameter of IN_POLYLINE_LIST.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


1234567