shenjiayu17 opened a new pull request #4012: URL: https://github.com/apache/carbondata/pull/4012 ### Why is this PR needed? Spatial index feature optimization of CarbonData ### What changes were proposed in this PR? 1. Update spatial index encoded algorithm, which can reduce the required properties of creating geo table 2. Enhance geo query UDFs, support querying geo table with polygon list, polyline list, geoId range list. And add some geo transforming util UDFs. 3. Load data (include LOAD and INSERT INTO) allows user to input spatial index, which column will still generated internally when user does not give. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
ajantha-bhat commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-731243925 retest this please ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-731248638 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3077/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-731249184 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4833/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-731890305 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3097/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-731890504 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4850/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-731919095 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4851/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-731920282 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3098/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-732021180 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3101/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-732021580 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4854/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
marchpure commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-732209550 @MarvinLitt Please review ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#discussion_r529216859 ########## File path: geo/src/main/java/org/apache/carbondata/geo/GeoConstants.java ########## @@ -26,4 +26,31 @@ private GeoConstants() { // GeoHash type Spatial Index public static final String GEOHASH = "geohash"; + + // Regular expression to parse input polygons for IN_POLYGON_LIST + // public static final String POLYGON_REG_EXPRESSION = "POLYGON \\(\\(.*?\\)\\)"; + public static final String POLYGON_REG_EXPRESSION = "(?<=POLYGON \\(\\()(.*?)(?=(\\)\\)))"; + + // Regular expression to parse input polylines for IN_POLYLINE_LIST + public static final String POLYLINE_REG_EXPRESSION = "LINESTRING \\(.*?\\)"; + + // Regular expression to parse input rangelists for IN_POLYGON_RANGE_LIST + public static final String RANGELIST_REG_EXPRESSION = "(?<=RANGELIST \\()(.*?)(?=\\))"; + + // delimiter of input points or ranges + public static final String DEFAULT_DELIMITER = ","; + + // conversion factor of angle to radian + public static final double CONVERT_FACTOR = 180.0; + // Earth radius + public static final double EARTH_RADIUS = 6371004.0; Review comment: Can remove EARTH_RADIUS const def from GeoHashIndex.java now. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-733432993 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4892/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-733433233 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3138/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
shenjiayu17 commented on a change in pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#discussion_r530130471 ########## File path: geo/src/main/java/org/apache/carbondata/geo/GeoConstants.java ########## @@ -26,4 +26,31 @@ private GeoConstants() { // GeoHash type Spatial Index public static final String GEOHASH = "geohash"; + + // Regular expression to parse input polygons for IN_POLYGON_LIST + // public static final String POLYGON_REG_EXPRESSION = "POLYGON \\(\\(.*?\\)\\)"; + public static final String POLYGON_REG_EXPRESSION = "(?<=POLYGON \\(\\()(.*?)(?=(\\)\\)))"; + + // Regular expression to parse input polylines for IN_POLYLINE_LIST + public static final String POLYLINE_REG_EXPRESSION = "LINESTRING \\(.*?\\)"; + + // Regular expression to parse input rangelists for IN_POLYGON_RANGE_LIST + public static final String RANGELIST_REG_EXPRESSION = "(?<=RANGELIST \\()(.*?)(?=\\))"; + + // delimiter of input points or ranges + public static final String DEFAULT_DELIMITER = ","; + + // conversion factor of angle to radian + public static final double CONVERT_FACTOR = 180.0; + // Earth radius + public static final double EARTH_RADIUS = 6371004.0; Review comment: Done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-733502985 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4896/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-733503678 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3142/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
MarvinLitt commented on a change in pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#discussion_r529163393 ########## File path: geo/src/main/java/org/apache/carbondata/geo/GeoConstants.java ########## @@ -26,4 +26,31 @@ private GeoConstants() { // GeoHash type Spatial Index public static final String GEOHASH = "geohash"; + + // Regular expression to parse input polygons for IN_POLYGON_LIST + // public static final String POLYGON_REG_EXPRESSION = "POLYGON \\(\\(.*?\\)\\)"; + public static final String POLYGON_REG_EXPRESSION = "(?<=POLYGON \\(\\()(.*?)(?=(\\)\\)))"; + + // Regular expression to parse input polylines for IN_POLYLINE_LIST + public static final String POLYLINE_REG_EXPRESSION = "LINESTRING \\(.*?\\)"; + + // Regular expression to parse input rangelists for IN_POLYGON_RANGE_LIST + public static final String RANGELIST_REG_EXPRESSION = "(?<=RANGELIST \\()(.*?)(?=\\))"; + + // delimiter of input points or ranges + public static final String DEFAULT_DELIMITER = ","; + + // conversion factor of angle to radian + public static final double CONVERT_FACTOR = 180.0; + // Earth radius + public static final double EARTH_RADIUS = 6371004.0; + + // used in Geo Hash calculation formula for improving calculation accuracy + public static final int CONVERSION_RATIO = 100000000; + + // used for multiplying input longitude and latitude which are processed by * 10E6 + public static final int CONVERSION_FACTOR_FOR_ACCURACY = 100; + + // Length in meters of 1 degree of latitude + public static final double CONVERSION_FACTOR_OF_METER_TO_DEGREE = 111320; Review comment: delete comment code ########## File path: geo/src/main/java/org/apache/carbondata/geo/GeoHashIndex.java ########## @@ -245,48 +195,24 @@ public String generate(List<?> sources) throws Exception { */ @Override public List<Long[]> query(String polygon) throws Exception { - if (!validate(polygon)) { - return null; - } else { - String[] pointList = polygon.trim().split(","); - List<double[]> queryList = new ArrayList<>(); - for (String str: pointList) { - String[] points = splitString(str); - if (2 != points.length) { - throw new RuntimeException("longitude and latitude is a pair need 2 data"); - } else { - try { - queryList.add(new double[] {Double.valueOf(points[0]), Double.valueOf(points[1])}); - } catch (NumberFormatException e) { - throw new RuntimeException("can not covert the string data to double", e); - } - } - } - if (!checkPointsSame(pointList[0], pointList[pointList.length - 1])) { - throw new RuntimeException("the first point and last point in polygon should be same"); - } else { - List<Long[]> rangeList = getPolygonRangeList(queryList); - return rangeList; - } + try { + List<double[]> queryList = GeoHashUtils.getPointListFromPolygon(polygon); + return getPolygonRangeList(queryList); + } catch (Exception e) { Review comment: specific exception type should be specified ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala ########## @@ -95,6 +95,12 @@ class CarbonEnv { sparkSession.udf.register("text_match", new TextMatchUDF) sparkSession.udf.register("text_match_with_limit", new TextMatchMaxDocUDF) sparkSession.udf.register("in_polygon", new InPolygonUDF) + sparkSession.udf.register("in_polygon_list", new InPolygonListUDF) + sparkSession.udf.register("in_polyline_list", new InPolylineListUDF) + sparkSession.udf.register("in_polygon_range_list", new InPolygonRangeListUDF) + Review comment: register code should not at here, here is the overall process should not contain details func ########## File path: core/src/main/java/org/apache/carbondata/core/util/CustomIndex.java ########## @@ -59,6 +59,14 @@ */ public abstract ReturnType query(String query) throws Exception; + /** + * Query processor for custom index. + * @param queryPointList query point list for GeoHashIndex + * @return Returns list of ranges to be fetched + * @throws Exception + */ + public abstract ReturnType query(List<double[]> queryPointList) throws Exception; + Review comment: please add jira url to show the theory ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonEnv.scala ########## @@ -95,6 +95,12 @@ class CarbonEnv { sparkSession.udf.register("text_match", new TextMatchUDF) sparkSession.udf.register("text_match_with_limit", new TextMatchMaxDocUDF) sparkSession.udf.register("in_polygon", new InPolygonUDF) + sparkSession.udf.register("in_polygon_list", new InPolygonListUDF) + sparkSession.udf.register("in_polyline_list", new InPolylineListUDF) + sparkSession.udf.register("in_polygon_range_list", new InPolygonRangeListUDF) + Review comment: More and more, consider to use a function processing separately ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
MarvinLitt commented on a change in pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#discussion_r530356529 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/geo/GeoTest.scala ########## @@ -17,6 +17,8 @@ package org.apache.carbondata.geo +import scala.collection.mutable + Review comment: please add some more udf test for intersection status. like full inclusion, semi inclusion ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-733735461 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |