[GitHub] [carbondata] shenjiayu17 opened a new pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

classic Classic list List threaded Threaded
127 messages Options
1234567
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox

VenuReddy2103 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r532663075



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/geo/GeoTest.scala
##########
@@ -276,17 +274,14 @@ class GeoTest extends QueryTest with BeforeAndAfterAll with BeforeAndAfterEach {
          | 'SPATIAL_INDEX.spatial.sourcecolumns'='longitude, latitude',
          | 'SPATIAL_INDEX.spatial.originLatitude'='39.832277',
          | 'SPATIAL_INDEX.spatial.gridSize'='60',
-         | 'SPATIAL_INDEX.spatial.minLongitude'='115.811865',
-         | 'SPATIAL_INDEX.spatial.maxLongitude'='116.782233',
-         | 'SPATIAL_INDEX.spatial.minLatitude'='39.832277',
-         | 'SPATIAL_INDEX.spatial.maxLatitude'='40.225281',
          | 'SPATIAL_INDEX.spatial.conversionRatio'='1000000')
        """.stripMargin)
     loadData(sourceTable)
     createTable(targetTable)
+    // INSERT INTO will keep SPATIAL_INDEX column from sourceTable instead of generating internally
     sql(s"insert into  $targetTable select * from $sourceTable")
-    checkAnswer(sql(s"select *from $targetTable where mygeohash = '2196036'"),
-      Seq(Row(2196036, 1575428400000L, 116337069, 39951887)))
+    checkAnswer(sql(s"select *from $targetTable where mygeohash = '233137655761'"),

Review comment:
       We seem to be copying mygeohash value of soureTable as is to targetTable even if the `gridsize` and `originLatitude` params of both source and target tables are different? This seem like a problem because if user uses UDFs like convert from longlat2geoid, geoId2longlat etc, need to input the gridsize and originLatitude of the source table or target table ?
   Any particular reason to avoid the mygeohash value generation for target table?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r532665839



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/geo/GeoTest.scala
##########
@@ -354,16 +349,12 @@ class GeoTest extends QueryTest with BeforeAndAfterAll with BeforeAndAfterEach {
          | 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
          | 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
          | 'SPATIAL_INDEX.mygeohash.gridSize'='50',
-         | 'SPATIAL_INDEX.mygeohash.minLongitude'='115.811865',
-         | 'SPATIAL_INDEX.mygeohash.maxLongitude'='116.782233',
-         | 'SPATIAL_INDEX.mygeohash.minLatitude'='39.832277',
-         | 'SPATIAL_INDEX.mygeohash.maxLatitude'='40.225281',
          | 'SPATIAL_INDEX.mygeohash.conversionRatio'='1000000')
        """.stripMargin)
     sql(s"insert into $table1 select 0, 116337069, 39951887, 1575428400000")
     checkAnswer(
-      sql(s"select * from $table1 where mygeohash = '2196036'"),
-      Seq(Row(2196036, 116337069, 39951887, 1575428400000L)))
+      sql(s"select * from $table1 where mygeohash = '0'"),

Review comment:
       why shouldn't we generate mygeohash value ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r532710929



##########
File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonRangeListExpression.java
##########
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashUtils;
+import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+/**
+ * InPolygonRangeList expression processor. It inputs the InPolygonRangeList string to
+ * the Geo implementation's query method, inputs lists of range of IDs and is to be calculated
+ * the and/or/diff range list to filter. And then, build InExpression with list of all the IDs
+ * present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolygonRangeListExpression extends UnknownExpression
+    implements ConditionalExpression {
+
+  private String polygonRangeList;
+
+  private String opType;
+
+  private List<Long[]> ranges = new ArrayList<Long[]>();
+
+  private ColumnExpression column;
+
+  private static final ExpressionResult trueExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, true);
+
+  private static final ExpressionResult falseExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, false);
+
+  public PolygonRangeListExpression(String polygonRangeList, String opType, String columnName) {
+    this.polygonRangeList = polygonRangeList;
+    this.opType = opType;
+    this.column = new ColumnExpression(columnName, DataTypes.LONG);
+  }
+
+  private void processExpression() {
+    try {
+      // 1. parse the range list string
+      List<String> rangeLists = new ArrayList<>();
+      Pattern pattern = Pattern.compile(GeoConstants.RANGELIST_REG_EXPRESSION);
+      Matcher matcher = pattern.matcher(polygonRangeList);
+      while (matcher.find()) {
+        String matchedStr = matcher.group();
+        rangeLists.add(matchedStr);
+      }
+      // 2. process the range lists
+      if (rangeLists.size() > 0) {
+        List<Long[]> processedRangeList = getRangeListFromString(rangeLists.get(0));
+        for (int i = 1; i < rangeLists.size(); i++) {
+          List<Long[]> tempRangeList = getRangeListFromString(rangeLists.get(i));
+          processedRangeList = GeoHashUtils.processRangeList(
+            processedRangeList, tempRangeList, opType);
+        }
+        ranges = processedRangeList;
+        GeoHashUtils.validateRangeList(ranges);
+      }
+    } catch (Exception e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  private void sortRange(List<Long[]> rangeList) {
+    rangeList.sort(new Comparator<Long[]>() {
+      @Override
+      public int compare(Long[] x, Long[] y) {
+        return Long.compare(x[0], y[0]);
+      }
+    });
+  }
+
+  private void combineRange(List<Long[]> rangeList) {
+    if (rangeList.size() > 1) {
+      for (int i = 0, j = i + 1; i < rangeList.size() - 1; i++, j++) {
+        long previousEnd = rangeList.get(i)[1];
+        long nextStart = rangeList.get(j)[0];
+        if (previousEnd + 1 == nextStart) {

Review comment:
       Not handled for overlapping ranges ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r532711858



##########
File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonRangeListExpression.java
##########
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashUtils;
+import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+/**
+ * InPolygonRangeList expression processor. It inputs the InPolygonRangeList string to
+ * the Geo implementation's query method, inputs lists of range of IDs and is to be calculated
+ * the and/or/diff range list to filter. And then, build InExpression with list of all the IDs
+ * present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolygonRangeListExpression extends UnknownExpression
+    implements ConditionalExpression {
+
+  private String polygonRangeList;
+
+  private String opType;
+
+  private List<Long[]> ranges = new ArrayList<Long[]>();
+
+  private ColumnExpression column;
+
+  private static final ExpressionResult trueExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, true);
+
+  private static final ExpressionResult falseExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, false);
+
+  public PolygonRangeListExpression(String polygonRangeList, String opType, String columnName) {
+    this.polygonRangeList = polygonRangeList;
+    this.opType = opType;
+    this.column = new ColumnExpression(columnName, DataTypes.LONG);
+  }
+
+  private void processExpression() {
+    try {
+      // 1. parse the range list string
+      List<String> rangeLists = new ArrayList<>();
+      Pattern pattern = Pattern.compile(GeoConstants.RANGELIST_REG_EXPRESSION);
+      Matcher matcher = pattern.matcher(polygonRangeList);
+      while (matcher.find()) {
+        String matchedStr = matcher.group();
+        rangeLists.add(matchedStr);
+      }
+      // 2. process the range lists
+      if (rangeLists.size() > 0) {
+        List<Long[]> processedRangeList = getRangeListFromString(rangeLists.get(0));
+        for (int i = 1; i < rangeLists.size(); i++) {
+          List<Long[]> tempRangeList = getRangeListFromString(rangeLists.get(i));
+          processedRangeList = GeoHashUtils.processRangeList(
+            processedRangeList, tempRangeList, opType);
+        }
+        ranges = processedRangeList;
+        GeoHashUtils.validateRangeList(ranges);
+      }
+    } catch (Exception e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  private void sortRange(List<Long[]> rangeList) {
+    rangeList.sort(new Comparator<Long[]>() {
+      @Override
+      public int compare(Long[] x, Long[] y) {
+        return Long.compare(x[0], y[0]);
+      }
+    });
+  }
+
+  private void combineRange(List<Long[]> rangeList) {
+    if (rangeList.size() > 1) {
+      for (int i = 0, j = i + 1; i < rangeList.size() - 1; i++, j++) {
+        long previousEnd = rangeList.get(i)[1];
+        long nextStart = rangeList.get(j)[0];
+        if (previousEnd + 1 == nextStart) {
+          rangeList.get(j)[0] = rangeList.get(i)[0];
+          rangeList.get(i)[0] = null;
+          rangeList.get(i)[1] = null;
+        }
+      }
+      rangeList.removeIf(item -> item[0] == null && item[1] == null);
+    }
+  }
+
+  private List<Long[]> getRangeListFromString(String rangeListString) {
+    String[] rangeStringList = rangeListString.trim().split(GeoConstants.DEFAULT_DELIMITER);
+    List<Long[]> rangeList = new ArrayList<>();
+    for (String rangeString : rangeStringList) {
+      String[] range = GeoHashUtils.splitStringToPoint(rangeString);
+      if (range.length != 2) {
+        throw new RuntimeException("each range is a pair need 2 data");
+      }
+      try {
+        rangeList.add(new Long[] {Long.valueOf(range[0]), Long.valueOf(range[1])});

Review comment:
       Better check if Long.valueOf(range[0]) < Long.valueOf(range[1]) since it is user input.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r532715580



##########
File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolylineListExpression.java
##########
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.core.util.CustomIndex;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashIndex;
+import org.apache.carbondata.geo.GeoHashUtils;
+import org.apache.carbondata.geo.GeoOperationType;
+import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+import org.locationtech.jts.geom.Coordinate;
+import org.locationtech.jts.geom.Geometry;
+import org.locationtech.jts.geom.GeometryFactory;
+import org.locationtech.jts.geom.LineString;
+import org.locationtech.jts.geom.Polygon;
+import org.locationtech.jts.io.WKTReader;
+import org.locationtech.jts.operation.buffer.BufferParameters;
+
+/**
+ * InPolylineList expression processor. It inputs the InPolylineList string to the Geo
+ * implementation's query method, gets a list of range of IDs from each polygon and
+ * calculates the and/or/diff range list to filter as an output. And then, build
+ * InExpression with list of all the IDs present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolylineListExpression extends UnknownExpression
+    implements ConditionalExpression {
+
+  private static final GeometryFactory geoFactory = new GeometryFactory();
+
+  private String polylineString;
+
+  private Float bufferInMeter;
+
+  private GeoHashIndex instance;
+
+  private List<Long[]> ranges = new ArrayList<Long[]>();
+
+  private ColumnExpression column;
+
+  private static final ExpressionResult trueExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, true);
+
+  private static final ExpressionResult falseExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, false);
+
+  public PolylineListExpression(String polylineString, Float bufferInMeter, String columnName,
+      CustomIndex indexInstance) {
+    this.polylineString = polylineString;
+    this.bufferInMeter = bufferInMeter;
+    this.instance = (GeoHashIndex) indexInstance;
+    this.column = new ColumnExpression(columnName, DataTypes.LONG);
+  }
+
+  private void processExpression() {
+    try {
+      // transform the distance unit meter to degree
+      double buffer = bufferInMeter / GeoConstants.CONVERSION_FACTOR_OF_METER_TO_DEGREE;
+
+      // 1. parse the polyline list string and get polygon from each polyline
+      List<Geometry> polygonList = new ArrayList<>();
+      WKTReader wktReader = new WKTReader();
+      Pattern pattern = Pattern.compile(GeoConstants.POLYLINE_REG_EXPRESSION);
+      Matcher matcher = pattern.matcher(polylineString);
+      while (matcher.find()) {
+        String matchedStr = matcher.group();
+        LineString polylineCreatedFromStr = (LineString) wktReader.read(matchedStr);
+        Polygon polygonFromPolylineBuffer = (Polygon) polylineCreatedFromStr.buffer(
+            buffer, 0, BufferParameters.CAP_SQUARE);
+        polygonList.add(polygonFromPolylineBuffer);
+      }
+      // 2. get the range list of each polygon
+      if (polygonList.size() > 0) {

Review comment:
       throw exception when size < 2 as the input is from user and probable that input may not have matched regular expression




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r532569787



##########
File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonListExpression.java
##########
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.core.util.CustomIndex;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashIndex;
+import org.apache.carbondata.geo.GeoHashUtils;
+import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+/**
+ * InPolygonList expression processor. It inputs the InPolygonList string to the Geo
+ * implementation's query method, gets a list of range of IDs from each polygon and
+ * calculates the and/or/diff range list to filter as an output. And then, build
+ * InExpression with list of all the IDs present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolygonListExpression extends UnknownExpression implements ConditionalExpression {
+
+  private String polygonListString;
+
+  private String opType;
+
+  private GeoHashIndex instance;
+
+  private List<Long[]> ranges = new ArrayList<Long[]>();
+
+  private ColumnExpression column;
+
+  private static final ExpressionResult trueExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, true);
+
+  private static final ExpressionResult falseExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, false);
+
+  public PolygonListExpression(String polygonListString, String opType, String columnName,
+      CustomIndex indexInstance) {
+    this.polygonListString = polygonListString;
+    this.opType = opType;
+    this.instance = (GeoHashIndex)indexInstance;
+    this.column = new ColumnExpression(columnName, DataTypes.LONG);
+  }
+
+  private void processExpression() {
+    try {
+      // 1. parse the polygon list string
+      List<String> polygons = new ArrayList<>();
+      Pattern pattern = Pattern.compile(GeoConstants.POLYGON_REG_EXPRESSION);
+      Matcher matcher = pattern.matcher(polygonListString);
+      while (matcher.find()) {
+        String matchedStr = matcher.group();
+        polygons.add(matchedStr);
+      }
+      if (polygons.size() < 2) {
+        throw new RuntimeException("polygon list need at least 2 polygons, really has " +
+            polygons.size());
+      }
+      // 2. get the range list of each polygon
+      List<Long[]> processedRangeList = instance.query(polygons.get(0));
+      for (int i = 1; i < polygons.size(); i++) {
+        List<Long[]> tempRangeList = instance.query(polygons.get(i));
+        processedRangeList = GeoHashUtils.processRangeList(
+            processedRangeList, tempRangeList, opType);
+      }
+      ranges = processedRangeList;
+      GeoHashUtils.validateRangeList(ranges);

Review comment:
       Intention of validateRangeList() to check if each element of List<Long[]> is only array of 2 elements(i.e., has only min and max value of range) was because `instance` class can be implemented by any user class adhering to `CustomIndex` interface. To make sure, `query()` returns list of array of only 2 elements, have this validation.
   I think, we need to call `validateRangeList()` right after each call to `instance.query()`. Not required at the end after merge.
   check for other places too in this PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] brijoobopanna commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

brijoobopanna commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-736288833


   Thanks for your contribution, please raise disucssion in communtiy and get the design approved
   http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Proposal-Thoughts-on-general-guidelines-to-follow-in-Apache-CarbonData-community-td68525.html#a68578


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533959413



##########
File path: geo/src/main/java/org/apache/carbondata/geo/GeoHashUtils.java
##########
@@ -0,0 +1,411 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo;
+
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Objects;
+
+public class GeoHashUtils {
+
+  /**
+   * Get the degree of each grid in the east-west direction.
+   *
+   * @param originLatitude the origin point latitude
+   * @param gridSize the grid size
+   * @return Delta X is the degree of each grid in the east-west direction
+   */
+  public static double getDeltaX(double originLatitude, int gridSize) {
+    double mCos = Math.cos(originLatitude * Math.PI / GeoConstants.CONVERT_FACTOR);
+    return (GeoConstants.CONVERT_FACTOR * gridSize) / (Math.PI * GeoConstants.EARTH_RADIUS * mCos);
+  }
+
+  /**
+   * Get the degree of each grid in the north-south direction.
+   *
+   * @param gridSize the grid size
+   * @return Delta Y is the degree of each grid in the north-south direction
+   */
+  public static double getDeltaY(int gridSize) {
+    return (GeoConstants.CONVERT_FACTOR * gridSize) / (Math.PI * GeoConstants.EARTH_RADIUS);
+  }
+
+  /**
+   * Calculate the number of knives cut
+   *
+   * @param gridSize the grid size
+   * @param originLatitude the origin point latitude
+   * @return The number of knives cut
+   */
+  public static int getCutCount(int gridSize, double originLatitude) {
+    double deltaX = getDeltaX(originLatitude, gridSize);
+    int countX = Double.valueOf(
+        Math.ceil(Math.log(2 * GeoConstants.CONVERT_FACTOR / deltaX) / Math.log(2))).intValue();
+    double deltaY = getDeltaY(gridSize);
+    int countY = Double.valueOf(
+        Math.ceil(Math.log(GeoConstants.CONVERT_FACTOR / deltaY) / Math.log(2))).intValue();
+    return Math.max(countX, countY);
+  }
+
+  /**
+   * Convert input longitude and latitude to GeoID
+   *
+   * @param longitude Longitude, the actual longitude and latitude are processed by * coefficient,
+   *                  and the floating-point calculation is converted to integer calculation
+   * @param latitude Latitude, the actual longitude and latitude are processed by * coefficient,
+   *                  and the floating-point calculation is converted to integer calculation.
+   * @param oriLatitude the origin point latitude
+   * @param gridSize the grid size
+   * @return GeoID
+   */
+  public static long lonLat2GeoID(long longitude, long latitude, double oriLatitude, int gridSize) {
+    long longtitudeByRatio = longitude * GeoConstants.CONVERSION_FACTOR_FOR_ACCURACY;
+    long latitudeByRatio = latitude * GeoConstants.CONVERSION_FACTOR_FOR_ACCURACY;
+    int[] ij = lonLat2ColRow(longtitudeByRatio, latitudeByRatio, oriLatitude, gridSize);
+    return colRow2GeoID(ij[0], ij[1]);
+  }
+
+  /**
+   * Calculate geo id through grid index coordinates, the row and column of grid coordinates
+   * can be transformed by latitude and longitude
+   *
+   * @param longitude Longitude, the actual longitude and latitude are processed by * coefficient,
+   * and the floating-point calculation is converted to integer calculation
+   * @param latitude Latitude, the actual longitude and latitude are processed by * coefficient,
+   * and the floating-point calculation is converted to integer calculation
+   * @param oriLatitude the latitude of origin point,which is used to calculate the deltaX and cut
+   * level.
+   * @param gridSize the size of minimal grid after cut
+   * @return Grid ID value [row, column], column starts from 1
+   */
+  public static int[] lonLat2ColRow(long longitude, long latitude, double oriLatitude,
+      int gridSize) {
+    int cutLevel = getCutCount(gridSize, oriLatitude);
+    int column = (int) Math.floor(longitude / getDeltaX(oriLatitude, gridSize) /
+        GeoConstants.CONVERSION_RATIO) + (1 << (cutLevel - 1));
+    int row = (int) Math.floor(latitude / getDeltaY(gridSize) /
+        GeoConstants.CONVERSION_RATIO) + (1 << (cutLevel - 1));
+    return new int[] {row, column};
+  }
+
+  /**
+   * Calculate the corresponding GeoId value from the grid coordinates
+   *
+   * @param row Gridded row index
+   * @param column Gridded column index
+   * @return hash id
+   */
+  public static long colRow2GeoID(int row, int column) {
+    long geoID = 0L;
+    int bit = 0;
+    long sourceRow = (long) row;
+    long sourceColumn = (long)column;
+    while (sourceRow > 0 || sourceColumn > 0) {
+      geoID = geoID | ((sourceRow & 1) << (2 * bit + 1)) | ((sourceColumn & 1) << 2 * bit);
+      sourceRow >>= 1;
+      sourceColumn >>= 1;
+      bit++;
+    }
+    return geoID;
+  }
+
+  /**
+   * Convert input GeoID to longitude and latitude
+   *
+   * @param geoId GeoID
+   * @param oriLatitude the origin point latitude
+   * @param gridSize the grid size
+   * @return Longitude and latitude of grid center point
+   */
+  public static double[] geoID2LngLat(long geoId, double oriLatitude, int gridSize) {
+    int[] rowCol = geoID2ColRow(geoId);
+    int column = rowCol[1];
+    int row = rowCol[0];
+    int cutLevel = getCutCount(gridSize, oriLatitude);
+    double deltaX = getDeltaX(oriLatitude, gridSize);
+    double deltaY = getDeltaY(gridSize);
+    double longitude = (column - (1 << (cutLevel - 1)) + 0.5) * deltaX;
+    double latitude = (row - (1 << (cutLevel - 1)) + 0.5) * deltaY;
+    longitude = new BigDecimal(longitude).setScale(GeoConstants.SCALE_OF_LONGITUDE_AND_LATITUDE,
+        BigDecimal.ROUND_HALF_UP).doubleValue();
+    latitude = new BigDecimal(latitude).setScale(GeoConstants.SCALE_OF_LONGITUDE_AND_LATITUDE,
+        BigDecimal.ROUND_HALF_UP).doubleValue();
+    return new double[]{longitude, latitude};
+  }
+
+  /**
+   * Convert input GeoID to grid column and row
+   *
+   * @param geoId GeoID
+   * @return grid column index and row index
+   */
+  public static int[] geoID2ColRow(long geoId) {
+    int row = 0;
+    int column = 0;
+    int bit = 0;
+    long source = geoId;
+    while (source > 0) {
+      column |= (source & 1) << bit;
+      source >>= 1;
+      row |= (source & 1) << bit;
+      source >>= 1;
+      bit++;
+    }
+    return new int[] {row, column};
+  }
+
+  /**
+   * Convert input string polygon to GeoID range list
+   *
+   * @param polygon input polygon string
+   * @param oriLatitude the origin point latitude
+   * @param gridSize the grid size
+   * @return GeoID range list of the polygon
+   */
+  public static List<Long[]> getRangeList(String polygon, double oriLatitude, int gridSize) {
+    List<double[]> queryPointList = getPointListFromPolygon(polygon);
+    int cutLevel = getCutCount(gridSize, oriLatitude);
+    double deltaY = getDeltaY(gridSize);
+    double maxLatitudeOfInitialArea = deltaY * Math.pow(2, cutLevel - 1);
+    double mCos = Math.cos(oriLatitude * Math.PI / GeoConstants.CONVERT_FACTOR);
+    double maxLongitudeOfInitialArea = maxLatitudeOfInitialArea / mCos;
+    double minLatitudeOfInitialArea = -maxLatitudeOfInitialArea;
+    double minLongitudeOfInitialArea = -maxLongitudeOfInitialArea;
+    QuadTreeCls qTreee = new QuadTreeCls(minLongitudeOfInitialArea, minLatitudeOfInitialArea,
+        maxLongitudeOfInitialArea, maxLatitudeOfInitialArea, cutLevel);
+    qTreee.insert(queryPointList);
+    return qTreee.getNodesData();
+  }
+
+  /**
+   * Convert input GeoID to upper layer GeoID of pyramid
+   *
+   * @param geoId GeoID
+   * @return the upper layer GeoID
+   */
+  public static long convertToUpperLayerGeoId(long geoId) {
+    return geoId >> 2;
+  }
+
+  /**
+   * Parse input polygon string to point list
+   *
+   * @param polygon input polygon string, example: POLYGON (35 10, 45 45, 15 40, 10 20, 35 10)
+   * @return the point list
+   */
+  public static List<double[]> getPointListFromPolygon(String polygon) {
+    String[] pointStringList = polygon.trim().split(GeoConstants.DEFAULT_DELIMITER);
+    if (4 > pointStringList.length) {
+      throw new RuntimeException(
+          "polygon need at least 3 points, really has " + pointStringList.length);
+    }
+    List<double[]> queryList = new ArrayList<>();
+    for (String pointString : pointStringList) {
+      String[] point = splitStringToPoint(pointString);
+      if (2 != point.length) {
+        throw new RuntimeException("longitude and latitude is a pair need 2 data");
+      }
+      try {
+        queryList.add(new double[] {Double.valueOf(point[0]), Double.valueOf(point[1])});
+      } catch (NumberFormatException e) {
+        throw new RuntimeException("can not covert the string data to double", e);
+      }
+    }
+    if (!checkPointsSame(pointStringList[0], pointStringList[pointStringList.length - 1])) {
+      throw new RuntimeException("the first point and last point in polygon should be same");
+    } else {
+      return queryList;
+    }
+  }
+
+  private static boolean checkPointsSame(String point1, String point2) {
+    String[] points1 = splitStringToPoint(point1);
+    String[] points2 = splitStringToPoint(point2);
+    return points1[0].equals(points2[0]) && points1[1].equals(points2[1]);
+  }
+
+  public static String[] splitStringToPoint(String str) {
+    return str.trim().split("\\s+");
+  }
+
+  public static void validateRangeList(List<Long[]> ranges) {
+    for (Long[] range : ranges) {
+      if (range.length != 2) {
+        throw new RuntimeException("Query processor must return list of ranges with each range "
+            + "containing minimum and maximum values");
+      }
+    }
+  }
+
+  /**
+   * Get two polygon's union and intersection
+   *
+   * @param rangeListA geoId range list of polygonA
+   * @param rangeListB geoId range list of polygonB
+   * @return geoId range list of processed set
+   */
+  public static List<Long[]> processRangeList(List<Long[]> rangeListA, List<Long[]> rangeListB,
+      String opType) {
+    List<Long[]> processedRangeList;
+    GeoOperationType operationType = GeoOperationType.getEnum(opType);
+    if (operationType == null) {
+      throw new RuntimeException("Unsupported operation type " + opType);
+    }
+    switch (operationType) {
+      case OR:
+        processedRangeList = getPolygonUnion(rangeListA, rangeListB);
+        break;
+      case AND:
+        processedRangeList = getPolygonIntersection(rangeListA, rangeListB);
+        break;
+      default:
+        throw new RuntimeException("Unsupported operation type " + opType);
+    }
+    return processedRangeList;
+  }
+
+  /**
+   * Get two polygon's union
+   *
+   * @param rangeListA geoId range list of polygonA
+   * @param rangeListB geoId range list of polygonB
+   * @return geoId range list after union
+   */
+  private static List<Long[]> getPolygonUnion(List<Long[]> rangeListA, List<Long[]> rangeListB) {
+    if (Objects.isNull(rangeListA)) {
+      return rangeListB;
+    }
+    if (Objects.isNull(rangeListB)) {
+      return rangeListA;
+    }
+    int sizeFirst = rangeListA.size();
+    int sizeSecond = rangeListB.size();
+    if (sizeFirst > sizeSecond) {
+      rangeListA.addAll(sizeFirst, rangeListB);
+      return mergeList(rangeListA);
+    } else {
+      rangeListB.addAll(sizeSecond, rangeListA);
+      return mergeList(rangeListB);
+    }
+  }
+
+  private static List<Long[]> mergeList(List<Long[]> list) {
+    if (list.size() == 0) {
+      return list;
+    }
+    Collections.sort(list, new Comparator<Long[]>() {
+      @Override
+      public int compare(Long[] arr1, Long[] arr2) {
+        return Long.compare(arr1[0], arr2[0]);
+      }
+    });
+    Long[] min;
+    Long[] max;
+    for (int i = 0; i < list.size(); i++) {
+      min = list.get(i);
+      for (int j = i + 1; j < list.size(); j++) {
+        max = list.get(j);
+        if (min[1] + 1 >= max[0]) {

Review comment:
       this part is developed by Discovery Team. we think your advice is right, and I added  _break_




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533960531



##########
File path: geo/src/main/java/org/apache/carbondata/geo/GeoOperationType.java
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo;
+
+public enum GeoOperationType {
+  OR("OR"),
+  AND("AND");
+
+  private String type;
+
+  GeoOperationType(String type) {

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533960629



##########
File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonListExpression.java
##########
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.core.util.CustomIndex;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashIndex;
+import org.apache.carbondata.geo.GeoHashUtils;
+import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+/**
+ * InPolygonList expression processor. It inputs the InPolygonList string to the Geo
+ * implementation's query method, gets a list of range of IDs from each polygon and
+ * calculates the and/or/diff range list to filter as an output. And then, build
+ * InExpression with list of all the IDs present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolygonListExpression extends UnknownExpression implements ConditionalExpression {

Review comment:
       Done

##########
File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonRangeListExpression.java
##########
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashUtils;
+import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+/**
+ * InPolygonRangeList expression processor. It inputs the InPolygonRangeList string to
+ * the Geo implementation's query method, inputs lists of range of IDs and is to be calculated
+ * the and/or/diff range list to filter. And then, build InExpression with list of all the IDs
+ * present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolygonRangeListExpression extends UnknownExpression

Review comment:
       Done

##########
File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolylineListExpression.java
##########
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.core.util.CustomIndex;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashIndex;
+import org.apache.carbondata.geo.GeoHashUtils;
+import org.apache.carbondata.geo.GeoOperationType;
+import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+import org.locationtech.jts.geom.Coordinate;
+import org.locationtech.jts.geom.Geometry;
+import org.locationtech.jts.geom.GeometryFactory;
+import org.locationtech.jts.geom.LineString;
+import org.locationtech.jts.geom.Polygon;
+import org.locationtech.jts.io.WKTReader;
+import org.locationtech.jts.operation.buffer.BufferParameters;
+
+/**
+ * InPolylineList expression processor. It inputs the InPolylineList string to the Geo
+ * implementation's query method, gets a list of range of IDs from each polygon and
+ * calculates the and/or/diff range list to filter as an output. And then, build
+ * InExpression with list of all the IDs present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolylineListExpression extends UnknownExpression

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533961464



##########
File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonListExpression.java
##########
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.core.util.CustomIndex;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashIndex;
+import org.apache.carbondata.geo.GeoHashUtils;
+import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+/**
+ * InPolygonList expression processor. It inputs the InPolygonList string to the Geo
+ * implementation's query method, gets a list of range of IDs from each polygon and
+ * calculates the and/or/diff range list to filter as an output. And then, build
+ * InExpression with list of all the IDs present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolygonListExpression extends UnknownExpression implements ConditionalExpression {
+
+  private String polygonListString;
+
+  private String opType;
+
+  private GeoHashIndex instance;
+
+  private List<Long[]> ranges = new ArrayList<Long[]>();
+
+  private ColumnExpression column;
+
+  private static final ExpressionResult trueExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, true);
+
+  private static final ExpressionResult falseExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, false);
+
+  public PolygonListExpression(String polygonListString, String opType, String columnName,
+      CustomIndex indexInstance) {
+    this.polygonListString = polygonListString;
+    this.opType = opType;
+    this.instance = (GeoHashIndex)indexInstance;
+    this.column = new ColumnExpression(columnName, DataTypes.LONG);
+  }
+
+  private void processExpression() {
+    try {
+      // 1. parse the polygon list string
+      List<String> polygons = new ArrayList<>();
+      Pattern pattern = Pattern.compile(GeoConstants.POLYGON_REG_EXPRESSION);
+      Matcher matcher = pattern.matcher(polygonListString);
+      while (matcher.find()) {
+        String matchedStr = matcher.group();
+        polygons.add(matchedStr);
+      }
+      if (polygons.size() < 2) {
+        throw new RuntimeException("polygon list need at least 2 polygons, really has " +
+            polygons.size());
+      }
+      // 2. get the range list of each polygon
+      List<Long[]> processedRangeList = instance.query(polygons.get(0));
+      for (int i = 1; i < polygons.size(); i++) {
+        List<Long[]> tempRangeList = instance.query(polygons.get(i));
+        processedRangeList = GeoHashUtils.processRangeList(
+            processedRangeList, tempRangeList, opType);
+      }
+      ranges = processedRangeList;
+      GeoHashUtils.validateRangeList(ranges);

Review comment:
       Done. have checked all of `instance.query`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533962334



##########
File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonRangeListExpression.java
##########
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashUtils;
+import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+/**
+ * InPolygonRangeList expression processor. It inputs the InPolygonRangeList string to
+ * the Geo implementation's query method, inputs lists of range of IDs and is to be calculated
+ * the and/or/diff range list to filter. And then, build InExpression with list of all the IDs
+ * present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolygonRangeListExpression extends UnknownExpression
+    implements ConditionalExpression {
+
+  private String polygonRangeList;
+
+  private String opType;
+
+  private List<Long[]> ranges = new ArrayList<Long[]>();
+
+  private ColumnExpression column;
+
+  private static final ExpressionResult trueExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, true);
+
+  private static final ExpressionResult falseExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, false);
+
+  public PolygonRangeListExpression(String polygonRangeList, String opType, String columnName) {
+    this.polygonRangeList = polygonRangeList;
+    this.opType = opType;
+    this.column = new ColumnExpression(columnName, DataTypes.LONG);
+  }
+
+  private void processExpression() {
+    try {
+      // 1. parse the range list string
+      List<String> rangeLists = new ArrayList<>();
+      Pattern pattern = Pattern.compile(GeoConstants.RANGELIST_REG_EXPRESSION);
+      Matcher matcher = pattern.matcher(polygonRangeList);
+      while (matcher.find()) {
+        String matchedStr = matcher.group();
+        rangeLists.add(matchedStr);
+      }
+      // 2. process the range lists
+      if (rangeLists.size() > 0) {
+        List<Long[]> processedRangeList = getRangeListFromString(rangeLists.get(0));
+        for (int i = 1; i < rangeLists.size(); i++) {
+          List<Long[]> tempRangeList = getRangeListFromString(rangeLists.get(i));
+          processedRangeList = GeoHashUtils.processRangeList(
+            processedRangeList, tempRangeList, opType);
+        }
+        ranges = processedRangeList;
+        GeoHashUtils.validateRangeList(ranges);
+      }
+    } catch (Exception e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  private void sortRange(List<Long[]> rangeList) {
+    rangeList.sort(new Comparator<Long[]>() {
+      @Override
+      public int compare(Long[] x, Long[] y) {
+        return Long.compare(x[0], y[0]);
+      }
+    });
+  }
+
+  private void combineRange(List<Long[]> rangeList) {
+    if (rangeList.size() > 1) {
+      for (int i = 0, j = i + 1; i < rangeList.size() - 1; i++, j++) {
+        long previousEnd = rangeList.get(i)[1];
+        long nextStart = rangeList.get(j)[0];
+        if (previousEnd + 1 == nextStart) {
+          rangeList.get(j)[0] = rangeList.get(i)[0];
+          rangeList.get(i)[0] = null;
+          rangeList.get(i)[1] = null;
+        }
+      }
+      rangeList.removeIf(item -> item[0] == null && item[1] == null);
+    }
+  }
+
+  private List<Long[]> getRangeListFromString(String rangeListString) {
+    String[] rangeStringList = rangeListString.trim().split(GeoConstants.DEFAULT_DELIMITER);
+    List<Long[]> rangeList = new ArrayList<>();
+    for (String rangeString : rangeStringList) {
+      String[] range = GeoHashUtils.splitStringToPoint(rangeString);
+      if (range.length != 2) {
+        throw new RuntimeException("each range is a pair need 2 data");
+      }
+      try {
+        rangeList.add(new Long[] {Long.valueOf(range[0]), Long.valueOf(range[1])});
+      } catch (NumberFormatException e) {
+        throw new RuntimeException("can not covert the string data to long", e);

Review comment:
       I dont understand here. should not be `covert string data to long` ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533968484



##########
File path: geo/src/main/java/org/apache/carbondata/geo/GeoConstants.java
##########
@@ -26,4 +26,34 @@ private GeoConstants() {
 
   // GeoHash type Spatial Index
   public static final String GEOHASH = "geohash";
+
+  // Regular expression to parse input polygons for IN_POLYGON_LIST
+  public static final String POLYGON_REG_EXPRESSION = "(?<=POLYGON \\(\\()(.*?)(?=(\\)\\)))";

Review comment:
       we want to use the generic and fix input format,  search some others defination and decide it.
   https://postgis.net/docs/ST_GeomFromText.html
   https://stackoverflow.com/questions/8576228/geomfromtext-function-documentation
   https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Geometric_objects




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533971365



##########
File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonRangeListExpression.java
##########
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashUtils;
+import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+/**
+ * InPolygonRangeList expression processor. It inputs the InPolygonRangeList string to
+ * the Geo implementation's query method, inputs lists of range of IDs and is to be calculated
+ * the and/or/diff range list to filter. And then, build InExpression with list of all the IDs
+ * present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolygonRangeListExpression extends UnknownExpression
+    implements ConditionalExpression {
+
+  private String polygonRangeList;
+
+  private String opType;
+
+  private List<Long[]> ranges = new ArrayList<Long[]>();
+
+  private ColumnExpression column;
+
+  private static final ExpressionResult trueExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, true);
+
+  private static final ExpressionResult falseExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, false);
+
+  public PolygonRangeListExpression(String polygonRangeList, String opType, String columnName) {
+    this.polygonRangeList = polygonRangeList;
+    this.opType = opType;
+    this.column = new ColumnExpression(columnName, DataTypes.LONG);
+  }
+
+  private void processExpression() {
+    try {
+      // 1. parse the range list string
+      List<String> rangeLists = new ArrayList<>();
+      Pattern pattern = Pattern.compile(GeoConstants.RANGELIST_REG_EXPRESSION);
+      Matcher matcher = pattern.matcher(polygonRangeList);
+      while (matcher.find()) {
+        String matchedStr = matcher.group();
+        rangeLists.add(matchedStr);
+      }
+      // 2. process the range lists
+      if (rangeLists.size() > 0) {

Review comment:
       but PolygonRangeListExpression is hoped to receive one or more rangelist, like `IN_POLYGON_RANGE_LIST(RANGELIST (855279368850 855279368852, 855280799610 855280799612, 855282156300 855282157400)', 'OR')`
   this scene is not similar to polygonlist.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533971772



##########
File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonRangeListExpression.java
##########
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashUtils;
+import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+/**
+ * InPolygonRangeList expression processor. It inputs the InPolygonRangeList string to
+ * the Geo implementation's query method, inputs lists of range of IDs and is to be calculated
+ * the and/or/diff range list to filter. And then, build InExpression with list of all the IDs
+ * present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolygonRangeListExpression extends UnknownExpression
+    implements ConditionalExpression {
+
+  private String polygonRangeList;
+
+  private String opType;
+
+  private List<Long[]> ranges = new ArrayList<Long[]>();
+
+  private ColumnExpression column;
+
+  private static final ExpressionResult trueExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, true);
+
+  private static final ExpressionResult falseExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, false);
+
+  public PolygonRangeListExpression(String polygonRangeList, String opType, String columnName) {
+    this.polygonRangeList = polygonRangeList;
+    this.opType = opType;
+    this.column = new ColumnExpression(columnName, DataTypes.LONG);
+  }
+
+  private void processExpression() {
+    try {
+      // 1. parse the range list string
+      List<String> rangeLists = new ArrayList<>();
+      Pattern pattern = Pattern.compile(GeoConstants.RANGELIST_REG_EXPRESSION);
+      Matcher matcher = pattern.matcher(polygonRangeList);
+      while (matcher.find()) {
+        String matchedStr = matcher.group();
+        rangeLists.add(matchedStr);
+      }
+      // 2. process the range lists
+      if (rangeLists.size() > 0) {
+        List<Long[]> processedRangeList = getRangeListFromString(rangeLists.get(0));
+        for (int i = 1; i < rangeLists.size(); i++) {
+          List<Long[]> tempRangeList = getRangeListFromString(rangeLists.get(i));
+          processedRangeList = GeoHashUtils.processRangeList(
+            processedRangeList, tempRangeList, opType);
+        }
+        ranges = processedRangeList;
+        GeoHashUtils.validateRangeList(ranges);
+      }
+    } catch (Exception e) {

Review comment:
       Yeah, have removed it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533971945



##########
File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonRangeListExpression.java
##########
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashUtils;
+import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+/**
+ * InPolygonRangeList expression processor. It inputs the InPolygonRangeList string to
+ * the Geo implementation's query method, inputs lists of range of IDs and is to be calculated
+ * the and/or/diff range list to filter. And then, build InExpression with list of all the IDs
+ * present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolygonRangeListExpression extends UnknownExpression
+    implements ConditionalExpression {
+
+  private String polygonRangeList;
+
+  private String opType;
+
+  private List<Long[]> ranges = new ArrayList<Long[]>();
+
+  private ColumnExpression column;
+
+  private static final ExpressionResult trueExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, true);
+
+  private static final ExpressionResult falseExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, false);
+
+  public PolygonRangeListExpression(String polygonRangeList, String opType, String columnName) {
+    this.polygonRangeList = polygonRangeList;
+    this.opType = opType;
+    this.column = new ColumnExpression(columnName, DataTypes.LONG);
+  }
+
+  private void processExpression() {
+    try {
+      // 1. parse the range list string
+      List<String> rangeLists = new ArrayList<>();
+      Pattern pattern = Pattern.compile(GeoConstants.RANGELIST_REG_EXPRESSION);
+      Matcher matcher = pattern.matcher(polygonRangeList);
+      while (matcher.find()) {
+        String matchedStr = matcher.group();
+        rangeLists.add(matchedStr);
+      }
+      // 2. process the range lists
+      if (rangeLists.size() > 0) {
+        List<Long[]> processedRangeList = getRangeListFromString(rangeLists.get(0));
+        for (int i = 1; i < rangeLists.size(); i++) {
+          List<Long[]> tempRangeList = getRangeListFromString(rangeLists.get(i));
+          processedRangeList = GeoHashUtils.processRangeList(
+            processedRangeList, tempRangeList, opType);
+        }
+        ranges = processedRangeList;
+        GeoHashUtils.validateRangeList(ranges);

Review comment:
       right, I have removed it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533972546



##########
File path: geo/src/main/java/org/apache/carbondata/geo/scan/filter/executor/PolygonFilterExecutorImpl.java
##########
@@ -73,8 +77,24 @@ private int getNearestRangeIndex(List<Long[]> ranges, long searchForNumber) {
    * @return True or False  True if current block or blocket needs to be scanned. Otherwise False.
    */
   private boolean isScanRequired(byte[] maxValue, byte[] minValue) {
-    PolygonExpression polygon = (PolygonExpression) exp;
-    List<Long[]> ranges = polygon.getRanges();
+    List<Long[]> ranges = new ArrayList<>();
+    if (exp instanceof PolygonExpression) {

Review comment:
       Removed the changes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533972715



##########
File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonRangeListExpression.java
##########
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashUtils;
+import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+/**
+ * InPolygonRangeList expression processor. It inputs the InPolygonRangeList string to
+ * the Geo implementation's query method, inputs lists of range of IDs and is to be calculated
+ * the and/or/diff range list to filter. And then, build InExpression with list of all the IDs
+ * present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolygonRangeListExpression extends UnknownExpression
+    implements ConditionalExpression {
+
+  private String polygonRangeList;
+
+  private String opType;
+
+  private List<Long[]> ranges = new ArrayList<Long[]>();
+
+  private ColumnExpression column;
+
+  private static final ExpressionResult trueExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, true);
+
+  private static final ExpressionResult falseExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, false);
+
+  public PolygonRangeListExpression(String polygonRangeList, String opType, String columnName) {
+    this.polygonRangeList = polygonRangeList;
+    this.opType = opType;
+    this.column = new ColumnExpression(columnName, DataTypes.LONG);
+  }
+
+  private void processExpression() {
+    try {
+      // 1. parse the range list string
+      List<String> rangeLists = new ArrayList<>();
+      Pattern pattern = Pattern.compile(GeoConstants.RANGELIST_REG_EXPRESSION);
+      Matcher matcher = pattern.matcher(polygonRangeList);
+      while (matcher.find()) {
+        String matchedStr = matcher.group();
+        rangeLists.add(matchedStr);
+      }
+      // 2. process the range lists
+      if (rangeLists.size() > 0) {
+        List<Long[]> processedRangeList = getRangeListFromString(rangeLists.get(0));
+        for (int i = 1; i < rangeLists.size(); i++) {
+          List<Long[]> tempRangeList = getRangeListFromString(rangeLists.get(i));
+          processedRangeList = GeoHashUtils.processRangeList(
+            processedRangeList, tempRangeList, opType);
+        }
+        ranges = processedRangeList;
+        GeoHashUtils.validateRangeList(ranges);
+      }
+    } catch (Exception e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  private void sortRange(List<Long[]> rangeList) {
+    rangeList.sort(new Comparator<Long[]>() {
+      @Override
+      public int compare(Long[] x, Long[] y) {
+        return Long.compare(x[0], y[0]);
+      }
+    });
+  }
+
+  private void combineRange(List<Long[]> rangeList) {
+    if (rangeList.size() > 1) {

Review comment:
       have removed it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533974578



##########
File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolylineListExpression.java
##########
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.core.util.CustomIndex;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashIndex;
+import org.apache.carbondata.geo.GeoHashUtils;
+import org.apache.carbondata.geo.GeoOperationType;
+import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+import org.locationtech.jts.geom.Coordinate;
+import org.locationtech.jts.geom.Geometry;
+import org.locationtech.jts.geom.GeometryFactory;
+import org.locationtech.jts.geom.LineString;
+import org.locationtech.jts.geom.Polygon;
+import org.locationtech.jts.io.WKTReader;
+import org.locationtech.jts.operation.buffer.BufferParameters;
+
+/**
+ * InPolylineList expression processor. It inputs the InPolylineList string to the Geo
+ * implementation's query method, gets a list of range of IDs from each polygon and
+ * calculates the and/or/diff range list to filter as an output. And then, build
+ * InExpression with list of all the IDs present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolylineListExpression extends UnknownExpression
+    implements ConditionalExpression {
+
+  private static final GeometryFactory geoFactory = new GeometryFactory();
+
+  private String polylineString;
+
+  private Float bufferInMeter;
+
+  private GeoHashIndex instance;
+
+  private List<Long[]> ranges = new ArrayList<Long[]>();
+
+  private ColumnExpression column;
+
+  private static final ExpressionResult trueExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, true);
+
+  private static final ExpressionResult falseExpRes =
+      new ExpressionResult(DataTypes.BOOLEAN, false);
+
+  public PolylineListExpression(String polylineString, Float bufferInMeter, String columnName,
+      CustomIndex indexInstance) {
+    this.polylineString = polylineString;
+    this.bufferInMeter = bufferInMeter;
+    this.instance = (GeoHashIndex) indexInstance;
+    this.column = new ColumnExpression(columnName, DataTypes.LONG);
+  }
+
+  private void processExpression() {
+    try {
+      // transform the distance unit meter to degree
+      double buffer = bufferInMeter / GeoConstants.CONVERSION_FACTOR_OF_METER_TO_DEGREE;
+
+      // 1. parse the polyline list string and get polygon from each polyline
+      List<Geometry> polygonList = new ArrayList<>();
+      WKTReader wktReader = new WKTReader();
+      Pattern pattern = Pattern.compile(GeoConstants.POLYLINE_REG_EXPRESSION);
+      Matcher matcher = pattern.matcher(polylineString);
+      while (matcher.find()) {
+        String matchedStr = matcher.group();
+        LineString polylineCreatedFromStr = (LineString) wktReader.read(matchedStr);
+        Polygon polygonFromPolylineBuffer = (Polygon) polylineCreatedFromStr.buffer(
+            buffer, 0, BufferParameters.CAP_SQUARE);
+        polygonList.add(polygonFromPolylineBuffer);
+      }
+      // 2. get the range list of each polygon
+      if (polygonList.size() > 0) {

Review comment:
        `IN_POLYLINE_LIST('LINESTRING (120.199242 30.324464, 120.190359 30.315388)', 65)`
   In fact, reg expression is the part `LINESTRING (120.199242 30.324464, 120.190359 30.315388)`, UDF can receive one or more LINESTRING in first input parameter.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

GitBox
In reply to this post by GitBox

shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533986229



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/geo/GeoTest.scala
##########
@@ -276,17 +274,14 @@ class GeoTest extends QueryTest with BeforeAndAfterAll with BeforeAndAfterEach {
          | 'SPATIAL_INDEX.spatial.sourcecolumns'='longitude, latitude',
          | 'SPATIAL_INDEX.spatial.originLatitude'='39.832277',
          | 'SPATIAL_INDEX.spatial.gridSize'='60',
-         | 'SPATIAL_INDEX.spatial.minLongitude'='115.811865',
-         | 'SPATIAL_INDEX.spatial.maxLongitude'='116.782233',
-         | 'SPATIAL_INDEX.spatial.minLatitude'='39.832277',
-         | 'SPATIAL_INDEX.spatial.maxLatitude'='40.225281',
          | 'SPATIAL_INDEX.spatial.conversionRatio'='1000000')
        """.stripMargin)
     loadData(sourceTable)
     createTable(targetTable)
+    // INSERT INTO will keep SPATIAL_INDEX column from sourceTable instead of generating internally
     sql(s"insert into  $targetTable select * from $sourceTable")
-    checkAnswer(sql(s"select *from $targetTable where mygeohash = '2196036'"),
-      Seq(Row(2196036, 1575428400000L, 116337069, 39951887)))
+    checkAnswer(sql(s"select *from $targetTable where mygeohash = '233137655761'"),

Review comment:
       Its requriment from Discovery. In the pr message `3. Load data (include LOAD and INSERT INTO) allows user to input spatial index, which column will still generated internally when user does not give.`
   sourceTable contains mygeohash value and insert with this value, so copy to targetTable. If insert without this value, targetTable will general internally.
   Actually, If input mygeohash is wrong, this indeed seems to cause mygeohash is Inconsistent with `gridsize` and `originLatitude` of targetTable.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


1234567