ShreelekhyaG opened a new pull request #4118: URL: https://github.com/apache/carbondata/pull/4118 ### Why is this PR needed? 1. SPATIAL_INDEX property, POLYGON, LINESTRING, and RANGELIST UDF's are case sensitive. 2. SPATIAL_INDEX.xx.gridSize and SPATIAL_INDEX.xxx.conversionRatio is accepting negative values. 3. IN_POLYGON_LIST udf, buffer size is accepting negative/character values. ### What changes were proposed in this PR? 1. converted properties to lower case and made UDF's case insensitive. 2. added validation for negative/character values. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA2 commented on pull request #4118: URL: https://github.com/apache/carbondata/pull/4118#issuecomment-815032923 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5140/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4118: URL: https://github.com/apache/carbondata/pull/4118#issuecomment-815044188 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3389/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4118: URL: https://github.com/apache/carbondata/pull/4118#issuecomment-822578777 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5211/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4118: URL: https://github.com/apache/carbondata/pull/4118#issuecomment-822579372 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3463/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
maheshrajus commented on a change in pull request #4118: URL: https://github.com/apache/carbondata/pull/4118#discussion_r618659835 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParserUtil.scala ########## @@ -564,7 +564,7 @@ object CarbonSparkSqlParserUtil { def needToConvertToLowerCase(key: String): Boolean = { var noConvertList = Array(CarbonCommonConstants.COMPRESSOR, "PATH", "bad_record_path", "timestampformat", "dateformat") - if (key.startsWith(CarbonCommonConstants.SPATIAL_INDEX) && key.endsWith(".class")) { + if (key.toLowerCase.startsWith(CarbonCommonConstants.SPATIAL_INDEX) && key.endsWith(".class")) { Review comment: can we add test case for validating this ? [SPATIAL_INDEX property, POLYGON, LINESTRING, and RANGELIST UDF's are case sensitive.] ########## File path: geo/src/main/java/org/apache/carbondata/geo/GeoHashUtils.java ########## @@ -83,6 +85,17 @@ public static long lonLat2GeoID(long longitude, long latitude, double oriLatitud return colRow2GeoID(ij[0], ij[1]); } + public static void validateUDFInputValue(Object input, String inputName, String datatype) + throws MalformedCarbonCommandException { + if (inputName.equalsIgnoreCase("gridSize") && (input == null Review comment: can we declare "gridSize" as a constant and use it ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
maheshrajus commented on pull request #4118: URL: https://github.com/apache/carbondata/pull/4118#issuecomment-825103733 why we are accepting invalid input values as per PR description ? [3.Accepting invalid values in geo UDF's.] -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
maheshrajus commented on pull request #4118: URL: https://github.com/apache/carbondata/pull/4118#issuecomment-825432437 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #4118: URL: https://github.com/apache/carbondata/pull/4118#discussion_r618991229 ########## File path: geo/src/main/java/org/apache/carbondata/geo/GeoHashIndex.java ########## @@ -136,17 +139,21 @@ public void init(String indexName, Map<String, String> properties) throws Except } String GRID_SIZE = commonKey + "gridsize"; String gridSize = properties.get(GRID_SIZE); - if (StringUtils.isEmpty(gridSize)) { + if (StringUtils.isEmpty(gridSize) || + !Pattern.compile(POSITIVE_INTEGER_REGEX).matcher(gridSize).find()) { throw new MalformedCarbonCommandException( - String.format("%s property is invalid. %s property must be specified.", - CarbonCommonConstants.SPATIAL_INDEX, GRID_SIZE)); + String.format("%s property is invalid. %s property must be specified, " + + "and the value must be positive integer.", + CarbonCommonConstants.SPATIAL_INDEX, GRID_SIZE)); } String CONVERSION_RATIO = commonKey + "conversionratio"; String conversionRatio = properties.get(CONVERSION_RATIO); - if (StringUtils.isEmpty(conversionRatio)) { + if (StringUtils.isEmpty(conversionRatio) || Review comment: Can move this validation to method and reuse ########## File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonRangeListExpression.java ########## @@ -49,14 +49,18 @@ public PolygonRangeListExpression(String polygonRangeList, String opType, String public void processExpression() { // 1. parse the range list string List<String> rangeLists = new ArrayList<>(); - Pattern pattern = Pattern.compile(GeoConstants.RANGELIST_REG_EXPRESSION); + Pattern pattern = + Pattern.compile(GeoConstants.RANGELIST_REG_EXPRESSION, Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(polygon); while (matcher.find()) { String matchedStr = matcher.group(); rangeLists.add(matchedStr); } // 2. process the range lists if (rangeLists.size() > 0) { + if (!opType.equals("OR") && !opType.equals("AND")) { Review comment: We can use GeoOperationType.getEnum(opType) and validate. ########## File path: geo/src/main/java/org/apache/carbondata/geo/GeoHashUtils.java ########## @@ -83,6 +85,17 @@ public static long lonLat2GeoID(long longitude, long latitude, double oriLatitud return colRow2GeoID(ij[0], ij[1]); } + public static void validateUDFInputValue(Object input, String inputName, String datatype) + throws MalformedCarbonCommandException { + if (inputName.equalsIgnoreCase("gridSize") && (input == null + || Integer.parseInt(input.toString()) <= 0)) { Review comment: can we use Pattern.compile(POSITIVE_INTEGER_REGEX) to validate here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ShreelekhyaG commented on a change in pull request #4118: URL: https://github.com/apache/carbondata/pull/4118#discussion_r620062704 ########## File path: geo/src/main/java/org/apache/carbondata/geo/GeoHashUtils.java ########## @@ -83,6 +85,17 @@ public static long lonLat2GeoID(long longitude, long latitude, double oriLatitud return colRow2GeoID(ij[0], ij[1]); } + public static void validateUDFInputValue(Object input, String inputName, String datatype) + throws MalformedCarbonCommandException { + if (inputName.equalsIgnoreCase("gridSize") && (input == null Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ShreelekhyaG commented on a change in pull request #4118: URL: https://github.com/apache/carbondata/pull/4118#discussion_r620062903 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParserUtil.scala ########## @@ -564,7 +564,7 @@ object CarbonSparkSqlParserUtil { def needToConvertToLowerCase(key: String): Boolean = { var noConvertList = Array(CarbonCommonConstants.COMPRESSOR, "PATH", "bad_record_path", "timestampformat", "dateformat") - if (key.startsWith(CarbonCommonConstants.SPATIAL_INDEX) && key.endsWith(".class")) { + if (key.toLowerCase.startsWith(CarbonCommonConstants.SPATIAL_INDEX) && key.endsWith(".class")) { Review comment: Done. edited existing test cases. ########## File path: geo/src/main/java/org/apache/carbondata/geo/GeoHashIndex.java ########## @@ -136,17 +139,21 @@ public void init(String indexName, Map<String, String> properties) throws Except } String GRID_SIZE = commonKey + "gridsize"; String gridSize = properties.get(GRID_SIZE); - if (StringUtils.isEmpty(gridSize)) { + if (StringUtils.isEmpty(gridSize) || + !Pattern.compile(POSITIVE_INTEGER_REGEX).matcher(gridSize).find()) { throw new MalformedCarbonCommandException( - String.format("%s property is invalid. %s property must be specified.", - CarbonCommonConstants.SPATIAL_INDEX, GRID_SIZE)); + String.format("%s property is invalid. %s property must be specified, " + + "and the value must be positive integer.", + CarbonCommonConstants.SPATIAL_INDEX, GRID_SIZE)); } String CONVERSION_RATIO = commonKey + "conversionratio"; String conversionRatio = properties.get(CONVERSION_RATIO); - if (StringUtils.isEmpty(conversionRatio)) { + if (StringUtils.isEmpty(conversionRatio) || Review comment: Done ########## File path: geo/src/main/java/org/apache/carbondata/geo/GeoHashUtils.java ########## @@ -83,6 +85,17 @@ public static long lonLat2GeoID(long longitude, long latitude, double oriLatitud return colRow2GeoID(ij[0], ij[1]); } + public static void validateUDFInputValue(Object input, String inputName, String datatype) + throws MalformedCarbonCommandException { + if (inputName.equalsIgnoreCase("gridSize") && (input == null + || Integer.parseInt(input.toString()) <= 0)) { Review comment: Done ########## File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonRangeListExpression.java ########## @@ -49,14 +49,18 @@ public PolygonRangeListExpression(String polygonRangeList, String opType, String public void processExpression() { // 1. parse the range list string List<String> rangeLists = new ArrayList<>(); - Pattern pattern = Pattern.compile(GeoConstants.RANGELIST_REG_EXPRESSION); + Pattern pattern = + Pattern.compile(GeoConstants.RANGELIST_REG_EXPRESSION, Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(polygon); while (matcher.find()) { String matchedStr = matcher.group(); rangeLists.add(matchedStr); } // 2. process the range lists if (rangeLists.size() > 0) { + if (!opType.equals("OR") && !opType.equals("AND")) { Review comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4118: URL: https://github.com/apache/carbondata/pull/4118#issuecomment-826623840 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5254/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4118: URL: https://github.com/apache/carbondata/pull/4118#issuecomment-826629049 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3507/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4118: URL: https://github.com/apache/carbondata/pull/4118#issuecomment-826686975 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3511/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4118: URL: https://github.com/apache/carbondata/pull/4118#issuecomment-826687542 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5258/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on pull request #4118: URL: https://github.com/apache/carbondata/pull/4118#issuecomment-826729914 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #4118: URL: https://github.com/apache/carbondata/pull/4118#discussion_r620232262 ########## File path: integration/spark/src/main/scala/org/apache/carbondata/geo/GeoUtilUDFs.scala ########## @@ -33,34 +33,51 @@ object GeoUtilUDFs { } } -class GeoIdToGridXyUDF extends (Long => Array[Int]) with Serializable { - override def apply(geoId: Long): Array[Int] = { +class GeoIdToGridXyUDF extends (java.lang.Long => Array[Int]) with Serializable { + override def apply(geoId: java.lang.Long): Array[Int] = { + GeoHashUtils.validateUDFInputValue(geoId, "geoId", "Long") GeoHashUtils.geoID2ColRow(geoId) } } -class GeoIdToLatLngUDF extends ((Long, Double, Int) => Array[Double]) with Serializable { - override def apply(geoId: Long, oriLatitude: Double, gridSize: Int): Array[Double] = { +class GeoIdToLatLngUDF + extends ((java.lang.Long, java.lang.Double, java.lang.Integer) => Array[Double]) with + Serializable { + override def apply(geoId: java.lang.Long, oriLatitude: java.lang.Double, + gridSize: java.lang.Integer): Array[Double] = { + GeoHashUtils.validateUDFInputValue(geoId, "geoId", "Long") + GeoHashUtils.validateUDFInputValue(oriLatitude, "oriLatitude", "Double") + GeoHashUtils.validateUDFInputValue(gridSize, "gridSize", "Integer") GeoHashUtils.geoID2LatLng(geoId, oriLatitude, gridSize) } } -class LatLngToGeoIdUDF extends ((Long, Long, Double, Int) => Long) with Serializable { - override def apply(latitude: Long, longitude: Long, oriLatitude: Double, gridSize: Int): Long = { +class LatLngToGeoIdUDF extends ((java.lang.Long, java.lang.Long, + java.lang.Double, java.lang.Integer) => Long) with Serializable { + override def apply(latitude: java.lang.Long, longitude: java.lang.Long, + oriLatitude: java.lang.Double, gridSize: java.lang.Integer): Long = { + GeoHashUtils.validateUDFInputValue(latitude, "latitude", "Long") + GeoHashUtils.validateUDFInputValue(longitude, "longitude", "Long") + GeoHashUtils.validateUDFInputValue(oriLatitude, "oriLatitude", "Double") + GeoHashUtils.validateUDFInputValue(gridSize, "gridSize", "Integer") GeoHashUtils.lonLat2GeoID(longitude, latitude, oriLatitude, gridSize) } } -class ToUpperLayerGeoIdUDF extends (Long => Long) with Serializable { - override def apply(geoId: Long): Long = { +class ToUpperLayerGeoIdUDF extends (java.lang.Long => Long) with Serializable { + override def apply(geoId: java.lang.Long): Long = { + GeoHashUtils.validateUDFInputValue(geoId, "geoId", "Long") GeoHashUtils.convertToUpperLayerGeoId(geoId) } } -class ToRangeListUDF extends ((String, Double, Int) => mutable.Buffer[Array[Long]]) - with Serializable { - override def apply(polygon: String, oriLatitude: Double, - gridSize: Int): mutable.Buffer[Array[Long]] = { +class ToRangeListUDF extends ((java.lang.String, java.lang.Double, java.lang.Integer) => + mutable.Buffer[Array[Long]]) with Serializable { + override def apply(polygon: java.lang.String, oriLatitude: java.lang.Double, + gridSize: java.lang.Integer): mutable.Buffer[Array[Long]] = { + GeoHashUtils.validateUDFInputValue(polygon, "polygon", "String") Review comment: Validating here while applying udf is not good. and also throws sparkexception. Can move this validation to optimizer rule or carbonsourcestratgey -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4118: URL: https://github.com/apache/carbondata/pull/4118#issuecomment-826792119 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3518/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4118: URL: https://github.com/apache/carbondata/pull/4118#issuecomment-826796034 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5264/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #4118: URL: https://github.com/apache/carbondata/pull/4118#discussion_r620858120 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonSourceStrategy.scala ########## @@ -62,10 +62,53 @@ private[sql] object CarbonSourceStrategy extends SparkStrategy { } catch { case _: CarbonPhysicalPlanException => Nil } + case Project(projectList, _: OneRowRelation) if validateUdf(projectList) => Nil case _ => Nil } } + private def validateUdf(projects: Seq[NamedExpression]): Boolean = { + projects foreach { + case alias: Alias if alias.child.isInstanceOf[Expression] => + alias.child match { + case Cast(s: ScalaUDF, _, _) => validateGeoUtilUDFs(s) + case s: ScalaUDF => validateGeoUtilUDFs(s) + case _ => + } + } + true + } + + def validateGeoUtilUDFs(s: ScalaUDF): Boolean = { + var geoUdf = false Review comment: Can remove this local variable. If invalid data is provided by user, then it will throw exception. Else, return true -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |