CarbonDataQA1 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-680182451 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3865/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-680183052 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2124/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r477059456 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/util/BadRecordUtil.scala ########## @@ -68,4 +71,32 @@ object BadRecordUtil { badRecordLocation = badRecordLocation + "/" + dbName + "/" + tableName FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(badRecordLocation)) } + + def createCSV(rows: ListBuffer[Array[String]], csvPath: String): Unit = { + val out = new BufferedWriter(new FileWriter(csvPath)) + val writer: CSVWriter = new CSVWriter(out) + try { + for (row <- rows) { + writer.writeNext(row) + } + } + catch { + case e: Exception => + Assert.fail(e.getMessage) + } + finally { + out.close() + writer.close() + } + } + + def deleteCSVFile(csvPath: String): Unit = { Review comment: ok ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r477059876 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/util/BadRecordUtil.scala ########## @@ -68,4 +71,32 @@ object BadRecordUtil { badRecordLocation = badRecordLocation + "/" + dbName + "/" + tableName FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(badRecordLocation)) } + + def createCSV(rows: ListBuffer[Array[String]], csvPath: String): Unit = { + val out = new BufferedWriter(new FileWriter(csvPath)) Review comment: could not find proper try-with-resource in scala. keeping it the same. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r477059846 ########## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ########## @@ -444,9 +446,38 @@ private static Object parseTimestamp(String dimensionValue, String dateFormat) { dateFormatter = timestampFormatter.get(); } dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + timeValue = dateToStr.getTime(); + validateTimeStampRange(timeValue); + return timeValue; } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { + try { + dateFormatter.setLenient(true); + dateToStr = dateFormatter.parse(dimensionValue); + dateFormatter.setLenient(false); + timeValue = dateToStr.getTime(); + validateTimeStampRange(timeValue); + LOGGER.info("Parsed data with lenience as true, setting back to default mode"); + return timeValue; + } catch (ParseException ex) { + dateFormatter.setLenient(false); Review comment: Can move setlenient to false to finally block ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r477059942 ########## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ########## @@ -444,9 +446,38 @@ private static Object parseTimestamp(String dimensionValue, String dateFormat) { dateFormatter = timestampFormatter.get(); } dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + timeValue = dateToStr.getTime(); + validateTimeStampRange(timeValue); + return timeValue; } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { + try { + dateFormatter.setLenient(true); + dateToStr = dateFormatter.parse(dimensionValue); + dateFormatter.setLenient(false); + timeValue = dateToStr.getTime(); + validateTimeStampRange(timeValue); + LOGGER.info("Parsed data with lenience as true, setting back to default mode"); + return timeValue; + } catch (ParseException ex) { + dateFormatter.setLenient(false); + LOGGER.info("Failed to parse data with lenience as true, setting back to default mode"); + throw new NumberFormatException(ex.getMessage()); + } + } else { + throw new NumberFormatException(e.getMessage()); + } + } + } + + private static void validateTimeStampRange(Long timeValue) { + if (timeValue < DateDirectDictionaryGenerator.MIN_VALUE + || timeValue > DateDirectDictionaryGenerator.MAX_VALUE) { + throw new NumberFormatException( + "timestamp column data value: " + timeValue + "is not in valid " + "range of: " Review comment: ok changed ########## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ########## @@ -1592,6 +1592,15 @@ private CarbonCommonConstants() { public static final String CARBON_LUCENE_INDEX_STOP_WORDS_DEFAULT = "false"; + // Property to enable parsing the timestamp/date data with setLenient = true in load + // flow if it fails with parse invalid timestamp data. (example: 1941-03-15 00:00:00 + // is valid time in Asia/Calcutta zone and is invalid and will fail to parse in Asia/Shanghai + // zone as DST is observed and clocks were turned forward 1 hour to 1941-03-15 01:00:00) + @CarbonProperty(dynamicConfigurable = true) public static final String Review comment: ok ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r477061165 ########## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ########## @@ -444,9 +446,38 @@ private static Object parseTimestamp(String dimensionValue, String dateFormat) { dateFormatter = timestampFormatter.get(); } dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + timeValue = dateToStr.getTime(); + validateTimeStampRange(timeValue); + return timeValue; } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { + try { + dateFormatter.setLenient(true); Review comment: Please add comments in what scenario it is required ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r477080281 ########## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ########## @@ -444,9 +446,38 @@ private static Object parseTimestamp(String dimensionValue, String dateFormat) { dateFormatter = timestampFormatter.get(); } dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + timeValue = dateToStr.getTime(); + validateTimeStampRange(timeValue); + return timeValue; } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { + try { + dateFormatter.setLenient(true); Review comment: added an example. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r477080590 ########## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ########## @@ -444,9 +446,38 @@ private static Object parseTimestamp(String dimensionValue, String dateFormat) { dateFormatter = timestampFormatter.get(); } dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + timeValue = dateToStr.getTime(); + validateTimeStampRange(timeValue); + return timeValue; } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { + try { + dateFormatter.setLenient(true); + dateToStr = dateFormatter.parse(dimensionValue); + dateFormatter.setLenient(false); + timeValue = dateToStr.getTime(); + validateTimeStampRange(timeValue); + LOGGER.info("Parsed data with lenience as true, setting back to default mode"); + return timeValue; + } catch (ParseException ex) { + dateFormatter.setLenient(false); Review comment: ok ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-680755157 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2132/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-680755507 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3873/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-680780532 LGTM ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-680785361 LGTM ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r478153107 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala ########## @@ -306,7 +315,51 @@ class TestLoadDataWithDiffTimestampFormat extends QueryTest with BeforeAndAfterA } } + test("test load, update data with setlenient carbon property for daylight " + + "saving time from different timezone") { + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE, "true") + TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai")) + sql("DROP TABLE IF EXISTS test_time") + sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time Timestamp) STORED AS carbondata " + + "TBLPROPERTIES('dateformat'='yyyy-MM-dd', 'timestampformat'='yyyy-MM-dd HH:mm:ss') ") + sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time") + sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ") + sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'") + checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))) + checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))) + checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))) + sql("DROP TABLE test_time") + CarbonProperties.getInstance().removeProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE) + } + + test("test load, update data with setlenient session level property for daylight " + + "saving time from different timezone") { + sql("set carbon.load.dateformat.setlenient.enable = true") + TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai")) + sql("DROP TABLE IF EXISTS test_time") + sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time Timestamp) STORED AS carbondata " + + "TBLPROPERTIES('dateformat'='yyyy-MM-dd', 'timestampformat'='yyyy-MM-dd HH:mm:ss') ") + sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time") + sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ") + sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'") + checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))) + checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))) + checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))) + sql("DROP TABLE test_time") + defaultConfig() + } + + def generateCSVFile(): Unit = { + val rows = new ListBuffer[Array[String]] + rows += Array("ID", "date", "time") + rows += Array("1", "1941-3-15", "1941-3-15 00:00:00") + rows += Array("2", "2016-7-24", "2016-7-24 01:02:30") + BadRecordUtil.createCSV(rows, csvPath) + } + override def afterAll { sql("DROP TABLE IF EXISTS t3") + FileUtils.forceDelete(new File(csvPath)) + TimeZone.setDefault(defaultTimeZone) Review comment: `afterAll()` is called only once per testcase file. But, have to set back`TimeZone.setDefault(defaultTimeZone)` at the end of the testcase where you changed zone to China/Shanghai. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r478153107 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala ########## @@ -306,7 +315,51 @@ class TestLoadDataWithDiffTimestampFormat extends QueryTest with BeforeAndAfterA } } + test("test load, update data with setlenient carbon property for daylight " + + "saving time from different timezone") { + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE, "true") + TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai")) + sql("DROP TABLE IF EXISTS test_time") + sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time Timestamp) STORED AS carbondata " + + "TBLPROPERTIES('dateformat'='yyyy-MM-dd', 'timestampformat'='yyyy-MM-dd HH:mm:ss') ") + sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time") + sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ") + sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'") + checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))) + checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))) + checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))) + sql("DROP TABLE test_time") + CarbonProperties.getInstance().removeProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE) + } + + test("test load, update data with setlenient session level property for daylight " + + "saving time from different timezone") { + sql("set carbon.load.dateformat.setlenient.enable = true") + TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai")) + sql("DROP TABLE IF EXISTS test_time") + sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time Timestamp) STORED AS carbondata " + + "TBLPROPERTIES('dateformat'='yyyy-MM-dd', 'timestampformat'='yyyy-MM-dd HH:mm:ss') ") + sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time") + sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ") + sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'") + checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))) + checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))) + checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))) + sql("DROP TABLE test_time") + defaultConfig() + } + + def generateCSVFile(): Unit = { + val rows = new ListBuffer[Array[String]] + rows += Array("ID", "date", "time") + rows += Array("1", "1941-3-15", "1941-3-15 00:00:00") + rows += Array("2", "2016-7-24", "2016-7-24 01:02:30") + BadRecordUtil.createCSV(rows, csvPath) + } + override def afterAll { sql("DROP TABLE IF EXISTS t3") + FileUtils.forceDelete(new File(csvPath)) + TimeZone.setDefault(defaultTimeZone) Review comment: `afterAll()` is called only once per testcase file. But, have to set back`TimeZone.setDefault(defaultTimeZone)` at the end of the testcase where you changed zone to China/Shanghai. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-681588563 LGTM ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
asfgit closed pull request #3896: URL: https://github.com/apache/carbondata/pull/3896 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |