[GitHub] [carbondata] ShreelekhyaG opened a new pull request #3896: [WIP] Fix load failures due to daylight saving time changes

classic Classic list List threaded Threaded
77 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

GitBox

akashrn5 commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r475716479



##########
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##########
@@ -435,18 +436,48 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue,
 
   private static Object parseTimestamp(String dimensionValue, String dateFormat) {
     Date dateToStr;
-    DateFormat dateFormatter;
+    DateFormat dateFormatter = null;
+    long timeValue;
     try {
       if (null != dateFormat && !dateFormat.trim().isEmpty()) {
         dateFormatter = new SimpleDateFormat(dateFormat);
-        dateFormatter.setLenient(false);
       } else {
         dateFormatter = timestampFormatter.get();
       }
+      dateFormatter.setLenient(false);
       dateToStr = dateFormatter.parse(dimensionValue);
-      return dateToStr.getTime();
+      timeValue = dateToStr.getTime();
+      validateTimeStampRange(timeValue);
+      return timeValue;
     } catch (ParseException e) {
-      throw new NumberFormatException(e.getMessage());
+      // If the parsing fails, try to parse again with setLenient to true if the property is set
+      if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+        try {
+          LOGGER.info("Changing setLenient to true");
+          dateFormatter.setLenient(true);
+          dateToStr = dateFormatter.parse(dimensionValue);
+          dateFormatter.setLenient(false);
+          LOGGER.info("Changing setLenient back to false");

Review comment:
       i think these logs not required here, may be you can add only one log say after line 462, saying you had set true as parsing failed for invalid data and parsing finished, something meaningful like this and in line 466, you can say `"failed to parse data with lenience as true, setting back to default"` like this , it will look clean and good

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/badrecordloger/BadRecordActionTest.scala
##########
@@ -270,6 +274,53 @@ class BadRecordActionTest extends QueryTest {
     }
   }
 
+  test("test bad record FAIL with invalid timestamp range") {
+    val csvPath = s"$resourcesPath/badrecords/invalidTimeStampRange.csv"
+    val rows1 = new ListBuffer[Array[String]]
+    rows1 += Array("ID", "date", "time")
+    rows1 += Array("1", "2016-7-24", "342016-7-24 01:02:30")
+    createCSV(rows1, csvPath)
+    sql("DROP TABLE IF EXISTS test_time")
+    sql("""
+           CREATE TABLE IF NOT EXISTS test_time
+           (ID Int, date Date, time Timestamp)
+           STORED AS carbondata TBLPROPERTIES('dateformat'='yyyy-MM-dd',
+           'timestampformat'='yyyy-MM-dd HH:mm:ss')
+        """)
+    val exception = intercept[Exception] {
+      sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/badrecords/invalidTimeStampRange.csv' " +
+          s"into table test_time options ('bad_records_action'='fail')")
+    }
+    assert(exception.getMessage
+      .contains(
+        "Data load failed due to bad record: The value with column name time and column data" +
+        " type TIMESTAMP is not a valid TIMESTAMP type.Please enable bad record logger to know" +
+        " the detail reason"))
+    sql("DROP TABLE IF EXISTS test_time")
+    deleteCSVFile(csvPath)
+  }
+
+  def createCSV(rows: ListBuffer[Array[String]], csvPath: String): Unit = {
+    val out = new BufferedWriter(new FileWriter(csvPath))
+    val writer: CSVWriter = new CSVWriter(out)
+
+    for (row <- rows) {
+      writer.writeNext(row)
+    }
+    out.close()
+    writer.close()
+  }
+
+  def deleteCSVFile(csvPath: String): Unit = {

Review comment:
       no need to create new method only for delete, call the file delete API directly in test case as not used in many test case.

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##########
@@ -306,7 +315,107 @@ class TestLoadDataWithDiffTimestampFormat extends QueryTest with BeforeAndAfterA
     }
   }
 
+  test("test load, update data with setlenient carbon property for daylight " +
+       "saving time from different timezone") {
+    CarbonProperties.getInstance().addProperty(
+      CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE, "true")
+    val defaultTimeZone = TimeZone.getDefault
+    TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+    sql("DROP TABLE IF EXISTS test_time")
+    sql(
+      """
+           CREATE TABLE IF NOT EXISTS test_time
+           (ID Int, date Date, time Timestamp)
+           STORED AS carbondata TBLPROPERTIES('dateformat'='yyyy-MM-dd',
+           'timestampformat'='yyyy-MM-dd HH:mm')
+        """)
+    sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time")
+    sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+    sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+    checkAnswer(

Review comment:
       put check answers in single line

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##########
@@ -306,7 +315,107 @@ class TestLoadDataWithDiffTimestampFormat extends QueryTest with BeforeAndAfterA
     }
   }
 
+  test("test load, update data with setlenient carbon property for daylight " +
+       "saving time from different timezone") {
+    CarbonProperties.getInstance().addProperty(
+      CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE, "true")
+    val defaultTimeZone = TimeZone.getDefault
+    TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+    sql("DROP TABLE IF EXISTS test_time")
+    sql(
+      """
+           CREATE TABLE IF NOT EXISTS test_time
+           (ID Int, date Date, time Timestamp)
+           STORED AS carbondata TBLPROPERTIES('dateformat'='yyyy-MM-dd',
+           'timestampformat'='yyyy-MM-dd HH:mm')
+        """)
+    sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time")
+    sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+    sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 1"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 11"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 2"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    sql("DROP TABLE test_time")
+    TimeZone.setDefault(defaultTimeZone)
+    CarbonProperties.getInstance().removeProperty(
+      CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE)
+  }
+
+  test("test load, update data with setlenient session level property for daylight " +
+       "saving time from different timezone") {
+    sql("set carbon.load.setlenient.enable = true")
+    val defaultTimeZone = TimeZone.getDefault
+    TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+    sql("DROP TABLE IF EXISTS test_time")
+    sql(
+      """
+           CREATE TABLE IF NOT EXISTS test_time
+           (ID Int, date Date, time Timestamp)
+           STORED AS carbondata TBLPROPERTIES('dateformat'='yyyy-MM-dd',
+           'timestampformat'='yyyy-MM-dd HH:mm')
+        """)
+    sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time")
+    sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+    sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 1"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 11"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 2"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    sql("DROP TABLE test_time")
+    TimeZone.setDefault(defaultTimeZone)
+    defaultConfig()
+    sqlContext.sparkSession.conf.unset("carbon.load.setlenient.enable")
+  }
+
+  def generateCSVFile(): Unit = {
+    val rows1 = new ListBuffer[Array[String]]

Review comment:
       ```suggestion
       val rows = new ListBuffer[Array[String]]
   ```

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##########
@@ -306,7 +315,107 @@ class TestLoadDataWithDiffTimestampFormat extends QueryTest with BeforeAndAfterA
     }
   }
 
+  test("test load, update data with setlenient carbon property for daylight " +
+       "saving time from different timezone") {
+    CarbonProperties.getInstance().addProperty(
+      CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE, "true")
+    val defaultTimeZone = TimeZone.getDefault
+    TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+    sql("DROP TABLE IF EXISTS test_time")
+    sql(
+      """
+           CREATE TABLE IF NOT EXISTS test_time
+           (ID Int, date Date, time Timestamp)
+           STORED AS carbondata TBLPROPERTIES('dateformat'='yyyy-MM-dd',
+           'timestampformat'='yyyy-MM-dd HH:mm')
+        """)
+    sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time")
+    sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+    sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 1"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 11"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 2"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    sql("DROP TABLE test_time")
+    TimeZone.setDefault(defaultTimeZone)
+    CarbonProperties.getInstance().removeProperty(
+      CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE)
+  }
+
+  test("test load, update data with setlenient session level property for daylight " +
+       "saving time from different timezone") {
+    sql("set carbon.load.setlenient.enable = true")
+    val defaultTimeZone = TimeZone.getDefault
+    TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+    sql("DROP TABLE IF EXISTS test_time")
+    sql(
+      """
+           CREATE TABLE IF NOT EXISTS test_time
+           (ID Int, date Date, time Timestamp)
+           STORED AS carbondata TBLPROPERTIES('dateformat'='yyyy-MM-dd',
+           'timestampformat'='yyyy-MM-dd HH:mm')
+        """)
+    sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time")
+    sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+    sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 1"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 11"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 2"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    sql("DROP TABLE test_time")
+    TimeZone.setDefault(defaultTimeZone)
+    defaultConfig()
+    sqlContext.sparkSession.conf.unset("carbon.load.setlenient.enable")
+  }
+
+  def generateCSVFile(): Unit = {
+    val rows1 = new ListBuffer[Array[String]]
+    rows1 += Array("ID", "date", "time")
+    rows1 += Array("1", "1941-3-15", "1941-3-15 00:00:00")
+    rows1 += Array("2", "2016-7-24", "2016-7-24 01:02:30")
+    createCSV(rows1, csvPath1)
+  }
+
+  def createCSV(rows: ListBuffer[Array[String]], csvPath: String): Unit = {

Review comment:
       instead of duplicate code, copy this method in any test util class, so that later anyone can reuse this.

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/badrecordloger/BadRecordActionTest.scala
##########
@@ -270,6 +274,53 @@ class BadRecordActionTest extends QueryTest {
     }
   }
 
+  test("test bad record FAIL with invalid timestamp range") {
+    val csvPath = s"$resourcesPath/badrecords/invalidTimeStampRange.csv"
+    val rows1 = new ListBuffer[Array[String]]
+    rows1 += Array("ID", "date", "time")
+    rows1 += Array("1", "2016-7-24", "342016-7-24 01:02:30")
+    createCSV(rows1, csvPath)
+    sql("DROP TABLE IF EXISTS test_time")
+    sql("""
+           CREATE TABLE IF NOT EXISTS test_time
+           (ID Int, date Date, time Timestamp)
+           STORED AS carbondata TBLPROPERTIES('dateformat'='yyyy-MM-dd',
+           'timestampformat'='yyyy-MM-dd HH:mm:ss')
+        """)
+    val exception = intercept[Exception] {
+      sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/badrecords/invalidTimeStampRange.csv' " +
+          s"into table test_time options ('bad_records_action'='fail')")
+    }
+    assert(exception.getMessage
+      .contains(
+        "Data load failed due to bad record: The value with column name time and column data" +
+        " type TIMESTAMP is not a valid TIMESTAMP type.Please enable bad record logger to know" +
+        " the detail reason"))
+    sql("DROP TABLE IF EXISTS test_time")
+    deleteCSVFile(csvPath)
+  }
+
+  def createCSV(rows: ListBuffer[Array[String]], csvPath: String): Unit = {
+    val out = new BufferedWriter(new FileWriter(csvPath))
+    val writer: CSVWriter = new CSVWriter(out)
+

Review comment:
       remove empty line, and add in try catch for writing and close in finally in a standard way.

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/badrecordloger/BadRecordActionTest.scala
##########
@@ -270,6 +274,53 @@ class BadRecordActionTest extends QueryTest {
     }
   }
 
+  test("test bad record FAIL with invalid timestamp range") {
+    val csvPath = s"$resourcesPath/badrecords/invalidTimeStampRange.csv"
+    val rows1 = new ListBuffer[Array[String]]
+    rows1 += Array("ID", "date", "time")
+    rows1 += Array("1", "2016-7-24", "342016-7-24 01:02:30")
+    createCSV(rows1, csvPath)
+    sql("DROP TABLE IF EXISTS test_time")
+    sql("""

Review comment:
       can format create SQL in just two lines, as we do not worry about scala style in test files. We can avoid duplicate lines or more lines

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##########
@@ -306,7 +315,107 @@ class TestLoadDataWithDiffTimestampFormat extends QueryTest with BeforeAndAfterA
     }
   }
 
+  test("test load, update data with setlenient carbon property for daylight " +
+       "saving time from different timezone") {
+    CarbonProperties.getInstance().addProperty(
+      CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE, "true")
+    val defaultTimeZone = TimeZone.getDefault
+    TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+    sql("DROP TABLE IF EXISTS test_time")
+    sql(
+      """

Review comment:
       same as above

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##########
@@ -306,7 +315,107 @@ class TestLoadDataWithDiffTimestampFormat extends QueryTest with BeforeAndAfterA
     }
   }
 
+  test("test load, update data with setlenient carbon property for daylight " +
+       "saving time from different timezone") {
+    CarbonProperties.getInstance().addProperty(
+      CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE, "true")
+    val defaultTimeZone = TimeZone.getDefault
+    TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+    sql("DROP TABLE IF EXISTS test_time")
+    sql(
+      """
+           CREATE TABLE IF NOT EXISTS test_time
+           (ID Int, date Date, time Timestamp)
+           STORED AS carbondata TBLPROPERTIES('dateformat'='yyyy-MM-dd',
+           'timestampformat'='yyyy-MM-dd HH:mm')
+        """)
+    sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time")
+    sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+    sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 1"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 11"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 2"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    sql("DROP TABLE test_time")
+    TimeZone.setDefault(defaultTimeZone)
+    CarbonProperties.getInstance().removeProperty(
+      CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE)
+  }
+
+  test("test load, update data with setlenient session level property for daylight " +
+       "saving time from different timezone") {
+    sql("set carbon.load.setlenient.enable = true")
+    val defaultTimeZone = TimeZone.getDefault
+    TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+    sql("DROP TABLE IF EXISTS test_time")
+    sql(

Review comment:
       same as above

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##########
@@ -306,7 +315,107 @@ class TestLoadDataWithDiffTimestampFormat extends QueryTest with BeforeAndAfterA
     }
   }
 
+  test("test load, update data with setlenient carbon property for daylight " +
+       "saving time from different timezone") {
+    CarbonProperties.getInstance().addProperty(
+      CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE, "true")
+    val defaultTimeZone = TimeZone.getDefault
+    TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+    sql("DROP TABLE IF EXISTS test_time")
+    sql(
+      """
+           CREATE TABLE IF NOT EXISTS test_time
+           (ID Int, date Date, time Timestamp)
+           STORED AS carbondata TBLPROPERTIES('dateformat'='yyyy-MM-dd',
+           'timestampformat'='yyyy-MM-dd HH:mm')
+        """)
+    sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time")
+    sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+    sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 1"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 11"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 2"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    sql("DROP TABLE test_time")
+    TimeZone.setDefault(defaultTimeZone)
+    CarbonProperties.getInstance().removeProperty(
+      CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE)
+  }
+
+  test("test load, update data with setlenient session level property for daylight " +
+       "saving time from different timezone") {
+    sql("set carbon.load.setlenient.enable = true")
+    val defaultTimeZone = TimeZone.getDefault
+    TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+    sql("DROP TABLE IF EXISTS test_time")
+    sql(
+      """
+           CREATE TABLE IF NOT EXISTS test_time
+           (ID Int, date Date, time Timestamp)
+           STORED AS carbondata TBLPROPERTIES('dateformat'='yyyy-MM-dd',
+           'timestampformat'='yyyy-MM-dd HH:mm')
+        """)
+    sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time")
+    sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+    sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+    checkAnswer(

Review comment:
       same as above

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##########
@@ -306,7 +315,107 @@ class TestLoadDataWithDiffTimestampFormat extends QueryTest with BeforeAndAfterA
     }
   }
 
+  test("test load, update data with setlenient carbon property for daylight " +
+       "saving time from different timezone") {
+    CarbonProperties.getInstance().addProperty(
+      CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE, "true")
+    val defaultTimeZone = TimeZone.getDefault
+    TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+    sql("DROP TABLE IF EXISTS test_time")
+    sql(
+      """
+           CREATE TABLE IF NOT EXISTS test_time
+           (ID Int, date Date, time Timestamp)
+           STORED AS carbondata TBLPROPERTIES('dateformat'='yyyy-MM-dd',
+           'timestampformat'='yyyy-MM-dd HH:mm')
+        """)
+    sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time")
+    sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+    sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 1"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 11"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 2"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    sql("DROP TABLE test_time")
+    TimeZone.setDefault(defaultTimeZone)
+    CarbonProperties.getInstance().removeProperty(
+      CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE)
+  }
+
+  test("test load, update data with setlenient session level property for daylight " +
+       "saving time from different timezone") {
+    sql("set carbon.load.setlenient.enable = true")
+    val defaultTimeZone = TimeZone.getDefault
+    TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+    sql("DROP TABLE IF EXISTS test_time")
+    sql(
+      """
+           CREATE TABLE IF NOT EXISTS test_time
+           (ID Int, date Date, time Timestamp)
+           STORED AS carbondata TBLPROPERTIES('dateformat'='yyyy-MM-dd',
+           'timestampformat'='yyyy-MM-dd HH:mm')
+        """)
+    sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time")
+    sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+    sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 1"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 11"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 2"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    sql("DROP TABLE test_time")
+    TimeZone.setDefault(defaultTimeZone)
+    defaultConfig()
+    sqlContext.sparkSession.conf.unset("carbon.load.setlenient.enable")
+  }
+
+  def generateCSVFile(): Unit = {
+    val rows1 = new ListBuffer[Array[String]]
+    rows1 += Array("ID", "date", "time")
+    rows1 += Array("1", "1941-3-15", "1941-3-15 00:00:00")
+    rows1 += Array("2", "2016-7-24", "2016-7-24 01:02:30")
+    createCSV(rows1, csvPath1)
+  }
+
+  def createCSV(rows: ListBuffer[Array[String]], csvPath: String): Unit = {
+    val out = new BufferedWriter(new FileWriter(csvPath))
+    val writer: CSVWriter = new CSVWriter(out)
+

Review comment:
       same comment as other test file, applies to this file.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476183063



##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -1592,6 +1592,13 @@ private CarbonCommonConstants() {
 
   public static final String CARBON_LUCENE_INDEX_STOP_WORDS_DEFAULT = "false";
 
+  // Property to enable parsing the timestamp/date data with setLenient = true in load
+  // flow if it fails with parse invalid timestamp data.
+  @CarbonProperty(dynamicConfigurable = true)
+  public static final String CARBON_LOAD_SETLENIENT_ENABLE = "carbon.load.setlenient.enable";

Review comment:
       ```suggestion
     public static final String CARBON_LOAD_TIMESTAMP_SETLENIENT_ENABLE = "carbon.load.setlenient.enable";
   ```
   as this property is specific to timestamp




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476183853



##########
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##########
@@ -435,18 +436,48 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue,
 
   private static Object parseTimestamp(String dimensionValue, String dateFormat) {
     Date dateToStr;
-    DateFormat dateFormatter;
+    DateFormat dateFormatter = null;
+    long timeValue;
     try {
       if (null != dateFormat && !dateFormat.trim().isEmpty()) {
         dateFormatter = new SimpleDateFormat(dateFormat);
-        dateFormatter.setLenient(false);

Review comment:
       I think you dont have to move this line down. as setLenient for timestampFormatter variable is handled during initialisation




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476184514



##########
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##########
@@ -435,18 +436,48 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue,
 
   private static Object parseTimestamp(String dimensionValue, String dateFormat) {
     Date dateToStr;
-    DateFormat dateFormatter;
+    DateFormat dateFormatter = null;
+    long timeValue;
     try {
       if (null != dateFormat && !dateFormat.trim().isEmpty()) {
         dateFormatter = new SimpleDateFormat(dateFormat);
-        dateFormatter.setLenient(false);
       } else {
         dateFormatter = timestampFormatter.get();
       }
+      dateFormatter.setLenient(false);
       dateToStr = dateFormatter.parse(dimensionValue);
-      return dateToStr.getTime();
+      timeValue = dateToStr.getTime();
+      validateTimeStampRange(timeValue);
+      return timeValue;
     } catch (ParseException e) {
-      throw new NumberFormatException(e.getMessage());
+      // If the parsing fails, try to parse again with setLenient to true if the property is set
+      if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+        try {
+          LOGGER.info("Changing setLenient to true");
+          dateFormatter.setLenient(true);
+          dateToStr = dateFormatter.parse(dimensionValue);
+          dateFormatter.setLenient(false);
+          LOGGER.info("Changing setLenient back to false");
+          timeValue = dateToStr.getTime();
+          validateTimeStampRange(timeValue);
+          return timeValue;
+        } catch (ParseException ex) {
+          dateFormatter.setLenient(false);
+          LOGGER.info("Changing setLenient back to false");
+          throw new NumberFormatException(ex.getMessage());
+        }
+      } else {
+        throw new NumberFormatException(e.getMessage());
+      }
+    }
+  }
+
+  private static void validateTimeStampRange(Long timeValue) {
+    if (timeValue < DateDirectDictionaryGenerator.MIN_VALUE
+        || timeValue > DateDirectDictionaryGenerator.MAX_VALUE) {
+      throw new NumberFormatException("timestamp column data is not in valid range of: "

Review comment:
       Can print timeValue also in the log




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476186302



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##########
@@ -306,7 +315,107 @@ class TestLoadDataWithDiffTimestampFormat extends QueryTest with BeforeAndAfterA
     }
   }
 
+  test("test load, update data with setlenient carbon property for daylight " +
+       "saving time from different timezone") {
+    CarbonProperties.getInstance().addProperty(
+      CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE, "true")
+    val defaultTimeZone = TimeZone.getDefault
+    TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+    sql("DROP TABLE IF EXISTS test_time")
+    sql(
+      """
+           CREATE TABLE IF NOT EXISTS test_time
+           (ID Int, date Date, time Timestamp)
+           STORED AS carbondata TBLPROPERTIES('dateformat'='yyyy-MM-dd',
+           'timestampformat'='yyyy-MM-dd HH:mm')
+        """)
+    sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time")
+    sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+    sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 1"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 11"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 2"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    sql("DROP TABLE test_time")
+    TimeZone.setDefault(defaultTimeZone)

Review comment:
       Can move this to afterAll, as if testcase fails, TimeZone will be left Unset to default




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476183063



##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -1592,6 +1592,13 @@ private CarbonCommonConstants() {
 
   public static final String CARBON_LUCENE_INDEX_STOP_WORDS_DEFAULT = "false";
 
+  // Property to enable parsing the timestamp/date data with setLenient = true in load
+  // flow if it fails with parse invalid timestamp data.
+  @CarbonProperty(dynamicConfigurable = true)
+  public static final String CARBON_LOAD_SETLENIENT_ENABLE = "carbon.load.setlenient.enable";

Review comment:
       ```suggestion
     public static final String CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE = "carbon.load.setlenient.enable";
   ```
   as this property is specific to date format




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476243946



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##########
@@ -306,7 +315,107 @@ class TestLoadDataWithDiffTimestampFormat extends QueryTest with BeforeAndAfterA
     }
   }
 
+  test("test load, update data with setlenient carbon property for daylight " +
+       "saving time from different timezone") {
+    CarbonProperties.getInstance().addProperty(
+      CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE, "true")
+    val defaultTimeZone = TimeZone.getDefault
+    TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+    sql("DROP TABLE IF EXISTS test_time")
+    sql(
+      """
+           CREATE TABLE IF NOT EXISTS test_time
+           (ID Int, date Date, time Timestamp)
+           STORED AS carbondata TBLPROPERTIES('dateformat'='yyyy-MM-dd',
+           'timestampformat'='yyyy-MM-dd HH:mm')
+        """)
+    sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time")
+    sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+    sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 1"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 11"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 2"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    sql("DROP TABLE test_time")
+    TimeZone.setDefault(defaultTimeZone)

Review comment:
       ok done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476244034



##########
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##########
@@ -435,18 +436,48 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue,
 
   private static Object parseTimestamp(String dimensionValue, String dateFormat) {
     Date dateToStr;
-    DateFormat dateFormatter;
+    DateFormat dateFormatter = null;
+    long timeValue;
     try {
       if (null != dateFormat && !dateFormat.trim().isEmpty()) {
         dateFormatter = new SimpleDateFormat(dateFormat);
-        dateFormatter.setLenient(false);
       } else {
         dateFormatter = timestampFormatter.get();
       }
+      dateFormatter.setLenient(false);
       dateToStr = dateFormatter.parse(dimensionValue);
-      return dateToStr.getTime();
+      timeValue = dateToStr.getTime();
+      validateTimeStampRange(timeValue);
+      return timeValue;
     } catch (ParseException e) {
-      throw new NumberFormatException(e.getMessage());
+      // If the parsing fails, try to parse again with setLenient to true if the property is set
+      if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+        try {
+          LOGGER.info("Changing setLenient to true");
+          dateFormatter.setLenient(true);
+          dateToStr = dateFormatter.parse(dimensionValue);
+          dateFormatter.setLenient(false);
+          LOGGER.info("Changing setLenient back to false");
+          timeValue = dateToStr.getTime();
+          validateTimeStampRange(timeValue);
+          return timeValue;
+        } catch (ParseException ex) {
+          dateFormatter.setLenient(false);
+          LOGGER.info("Changing setLenient back to false");
+          throw new NumberFormatException(ex.getMessage());
+        }
+      } else {
+        throw new NumberFormatException(e.getMessage());
+      }
+    }
+  }
+
+  private static void validateTimeStampRange(Long timeValue) {
+    if (timeValue < DateDirectDictionaryGenerator.MIN_VALUE
+        || timeValue > DateDirectDictionaryGenerator.MAX_VALUE) {
+      throw new NumberFormatException("timestamp column data is not in valid range of: "

Review comment:
       ok




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476244302



##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -1592,6 +1592,13 @@ private CarbonCommonConstants() {
 
   public static final String CARBON_LUCENE_INDEX_STOP_WORDS_DEFAULT = "false";
 
+  // Property to enable parsing the timestamp/date data with setLenient = true in load
+  // flow if it fails with parse invalid timestamp data.
+  @CarbonProperty(dynamicConfigurable = true)
+  public static final String CARBON_LOAD_SETLENIENT_ENABLE = "carbon.load.setlenient.enable";

Review comment:
       okay




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476244994



##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -1592,6 +1592,13 @@ private CarbonCommonConstants() {
 
   public static final String CARBON_LUCENE_INDEX_STOP_WORDS_DEFAULT = "false";
 
+  // Property to enable parsing the timestamp/date data with setLenient = true in load
+  // flow if it fails with parse invalid timestamp data.

Review comment:
       ok added example




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476269555



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##########
@@ -306,7 +315,107 @@ class TestLoadDataWithDiffTimestampFormat extends QueryTest with BeforeAndAfterA
     }
   }
 
+  test("test load, update data with setlenient carbon property for daylight " +
+       "saving time from different timezone") {
+    CarbonProperties.getInstance().addProperty(
+      CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE, "true")
+    val defaultTimeZone = TimeZone.getDefault
+    TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+    sql("DROP TABLE IF EXISTS test_time")
+    sql(
+      """
+           CREATE TABLE IF NOT EXISTS test_time
+           (ID Int, date Date, time Timestamp)
+           STORED AS carbondata TBLPROPERTIES('dateformat'='yyyy-MM-dd',
+           'timestampformat'='yyyy-MM-dd HH:mm')
+        """)
+    sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time")
+    sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+    sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 1"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 11"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 2"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    sql("DROP TABLE test_time")
+    TimeZone.setDefault(defaultTimeZone)
+    CarbonProperties.getInstance().removeProperty(
+      CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE)
+  }
+
+  test("test load, update data with setlenient session level property for daylight " +
+       "saving time from different timezone") {
+    sql("set carbon.load.setlenient.enable = true")
+    val defaultTimeZone = TimeZone.getDefault
+    TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+    sql("DROP TABLE IF EXISTS test_time")
+    sql(
+      """
+           CREATE TABLE IF NOT EXISTS test_time
+           (ID Int, date Date, time Timestamp)
+           STORED AS carbondata TBLPROPERTIES('dateformat'='yyyy-MM-dd',
+           'timestampformat'='yyyy-MM-dd HH:mm')
+        """)
+    sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time")
+    sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+    sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 1"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 11"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    checkAnswer(
+      sql("SELECT time FROM test_time WHERE ID = 2"),
+      Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+    )
+    sql("DROP TABLE test_time")
+    TimeZone.setDefault(defaultTimeZone)
+    defaultConfig()
+    sqlContext.sparkSession.conf.unset("carbon.load.setlenient.enable")
+  }
+
+  def generateCSVFile(): Unit = {
+    val rows1 = new ListBuffer[Array[String]]
+    rows1 += Array("ID", "date", "time")
+    rows1 += Array("1", "1941-3-15", "1941-3-15 00:00:00")
+    rows1 += Array("2", "2016-7-24", "2016-7-24 01:02:30")
+    createCSV(rows1, csvPath1)
+  }
+
+  def createCSV(rows: ListBuffer[Array[String]], csvPath: String): Unit = {

Review comment:
       moved createCSV and deleteCSV to a test util class




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#issuecomment-679916329


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3857/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#issuecomment-679920810


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2116/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476380921



##########
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##########
@@ -444,9 +446,38 @@ private static Object parseTimestamp(String dimensionValue, String dateFormat) {
         dateFormatter = timestampFormatter.get();
       }
       dateToStr = dateFormatter.parse(dimensionValue);
-      return dateToStr.getTime();
+      timeValue = dateToStr.getTime();
+      validateTimeStampRange(timeValue);
+      return timeValue;
     } catch (ParseException e) {
-      throw new NumberFormatException(e.getMessage());
+      // If the parsing fails, try to parse again with setLenient to true if the property is set
+      if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+        try {
+          dateFormatter.setLenient(true);
+          dateToStr = dateFormatter.parse(dimensionValue);
+          dateFormatter.setLenient(false);
+          timeValue = dateToStr.getTime();
+          validateTimeStampRange(timeValue);
+          LOGGER.info("Parsed data with lenience as true, setting back to default mode");
+          return timeValue;
+        } catch (ParseException ex) {
+          dateFormatter.setLenient(false);
+          LOGGER.info("Failed to parse data with lenience as true, setting back to default mode");

Review comment:
       This log is not needed i think, since we are throwing exception




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476386840



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/util/BadRecordUtil.scala
##########
@@ -68,4 +71,34 @@ object BadRecordUtil {
     badRecordLocation = badRecordLocation + "/" + dbName + "/" + tableName
     FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(badRecordLocation))
   }
+
+  def createCSV(rows: ListBuffer[Array[String]], csvPath: String): Unit = {
+    val out = new BufferedWriter(new FileWriter(csvPath))
+    val writer: CSVWriter = new CSVWriter(out)
+    try {
+      for (row <- rows) {
+        writer.writeNext(row)
+      }
+    }
+    catch {
+      case e: Exception =>
+        e.printStackTrace()

Review comment:
       Remove this line




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476387135



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/util/BadRecordUtil.scala
##########
@@ -68,4 +71,34 @@ object BadRecordUtil {
     badRecordLocation = badRecordLocation + "/" + dbName + "/" + tableName
     FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(badRecordLocation))
   }
+
+  def createCSV(rows: ListBuffer[Array[String]], csvPath: String): Unit = {
+    val out = new BufferedWriter(new FileWriter(csvPath))
+    val writer: CSVWriter = new CSVWriter(out)
+    try {
+      for (row <- rows) {
+        writer.writeNext(row)
+      }
+    }
+    catch {
+      case e: Exception =>
+        e.printStackTrace()
+        Assert.fail(e.getMessage)
+    }
+    finally {
+      out.close()
+      writer.close()
+    }
+  }
+
+  def deleteCSVFile(csvPath: String): Unit = {
+    try {
+      FileUtils.forceDelete(new File(csvPath))
+    }
+    catch {
+      case e: Exception =>
+        e.printStackTrace()

Review comment:
       Remove this line




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476380921



##########
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##########
@@ -444,9 +446,38 @@ private static Object parseTimestamp(String dimensionValue, String dateFormat) {
         dateFormatter = timestampFormatter.get();
       }
       dateToStr = dateFormatter.parse(dimensionValue);
-      return dateToStr.getTime();
+      timeValue = dateToStr.getTime();
+      validateTimeStampRange(timeValue);
+      return timeValue;
     } catch (ParseException e) {
-      throw new NumberFormatException(e.getMessage());
+      // If the parsing fails, try to parse again with setLenient to true if the property is set
+      if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+        try {
+          dateFormatter.setLenient(true);
+          dateToStr = dateFormatter.parse(dimensionValue);
+          dateFormatter.setLenient(false);
+          timeValue = dateToStr.getTime();
+          validateTimeStampRange(timeValue);
+          LOGGER.info("Parsed data with lenience as true, setting back to default mode");
+          return timeValue;
+        } catch (ParseException ex) {
+          dateFormatter.setLenient(false);
+          LOGGER.info("Failed to parse data with lenience as true, setting back to default mode");

Review comment:
       This log is not needed i think, since we are throwing exception




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#issuecomment-680055956


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3861/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476487539



##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -1592,6 +1592,15 @@ private CarbonCommonConstants() {
 
   public static final String CARBON_LUCENE_INDEX_STOP_WORDS_DEFAULT = "false";
 
+  // Property to enable parsing the timestamp/date data with setLenient = true in load
+  // flow if it fails with parse invalid timestamp data. (example: 1941-03-15 00:00:00
+  // is valid time in Asia/Calcutta zone and is invalid and will fail to parse in Asia/Shanghai
+  // zone as DST is observed and clocks were turned forward 1 hour to 1941-03-15 01:00:00)
+  @CarbonProperty(dynamicConfigurable = true) public static final String

Review comment:
       `move public static final String` to next line

##########
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##########
@@ -444,9 +446,38 @@ private static Object parseTimestamp(String dimensionValue, String dateFormat) {
         dateFormatter = timestampFormatter.get();
       }
       dateToStr = dateFormatter.parse(dimensionValue);
-      return dateToStr.getTime();
+      timeValue = dateToStr.getTime();
+      validateTimeStampRange(timeValue);
+      return timeValue;
     } catch (ParseException e) {
-      throw new NumberFormatException(e.getMessage());
+      // If the parsing fails, try to parse again with setLenient to true if the property is set
+      if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+        try {
+          dateFormatter.setLenient(true);
+          dateToStr = dateFormatter.parse(dimensionValue);
+          dateFormatter.setLenient(false);
+          timeValue = dateToStr.getTime();
+          validateTimeStampRange(timeValue);
+          LOGGER.info("Parsed data with lenience as true, setting back to default mode");
+          return timeValue;
+        } catch (ParseException ex) {
+          dateFormatter.setLenient(false);
+          LOGGER.info("Failed to parse data with lenience as true, setting back to default mode");
+          throw new NumberFormatException(ex.getMessage());
+        }
+      } else {
+        throw new NumberFormatException(e.getMessage());
+      }
+    }
+  }
+
+  private static void validateTimeStampRange(Long timeValue) {
+    if (timeValue < DateDirectDictionaryGenerator.MIN_VALUE
+        || timeValue > DateDirectDictionaryGenerator.MAX_VALUE) {
+      throw new NumberFormatException(
+          "timestamp column data value: " + timeValue + "is not in valid " + "range of: "

Review comment:
       `"is not in valid " + "range of: "` correct the formatting here, there are unnecessary concatenations

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/util/BadRecordUtil.scala
##########
@@ -68,4 +71,32 @@ object BadRecordUtil {
     badRecordLocation = badRecordLocation + "/" + dbName + "/" + tableName
     FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(badRecordLocation))
   }
+
+  def createCSV(rows: ListBuffer[Array[String]], csvPath: String): Unit = {
+    val out = new BufferedWriter(new FileWriter(csvPath))
+    val writer: CSVWriter = new CSVWriter(out)
+    try {
+      for (row <- rows) {
+        writer.writeNext(row)
+      }
+    }
+    catch {
+      case e: Exception =>
+        Assert.fail(e.getMessage)
+    }
+    finally {
+      out.close()
+      writer.close()
+    }
+  }
+
+  def deleteCSVFile(csvPath: String): Unit = {

Review comment:
       i don't think this method is required, just call `FileUtils.forceDelete(new File(csvPath))` in each test case itself and no need of any Assert here, as its not doing any functional validation

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/util/BadRecordUtil.scala
##########
@@ -68,4 +71,32 @@ object BadRecordUtil {
     badRecordLocation = badRecordLocation + "/" + dbName + "/" + tableName
     FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(badRecordLocation))
   }
+
+  def createCSV(rows: ListBuffer[Array[String]], csvPath: String): Unit = {
+    val out = new BufferedWriter(new FileWriter(csvPath))

Review comment:
       correct the formatting of this method and may be you can use try-with-resource here, you can check once




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#issuecomment-680058834


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2120/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


1234