Karan980 opened a new pull request #4046: URL: https://github.com/apache/carbondata/pull/4046 ### Why is this PR needed? When date or timestamp column is present inside complex columns (for eg : Array(Date)), it gives wrong result on reading through SDK. ### What changes were proposed in this PR? Fix the conversion of INT into date and LONG into timestamp column ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA2 commented on pull request #4046: URL: https://github.com/apache/carbondata/pull/4046#issuecomment-739950614 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3339/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4046: URL: https://github.com/apache/carbondata/pull/4046#issuecomment-739951736 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5100/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4046: URL: https://github.com/apache/carbondata/pull/4046#issuecomment-740382467 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3347/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4046: URL: https://github.com/apache/carbondata/pull/4046#issuecomment-740382885 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5109/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4046: URL: https://github.com/apache/carbondata/pull/4046#issuecomment-741589081 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5122/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4046: URL: https://github.com/apache/carbondata/pull/4046#issuecomment-741589722 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3360/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #4046: URL: https://github.com/apache/carbondata/pull/4046#discussion_r542237004 ########## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ########## @@ -126,6 +127,47 @@ public T readNextRow() throws IOException, InterruptedException { return formatDateAndTimeStamp((Object []) row); } + public Object getFormattedData(CarbonDimension dimension, Object row, SimpleDateFormat dateFormat, Review comment: Add method description ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #4046: URL: https://github.com/apache/carbondata/pull/4046#discussion_r542237261 ########## File path: sdk/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ########## @@ -1741,6 +1742,246 @@ public boolean accept(File dir, String name) { } } + @Test + public void testReadDateAndTimestampColumnInArray() { + String path = "./testWriteFiles"; + try { + FileUtils.deleteDirectory(new File(path)); + Field[] fields = new Field[11]; + fields[0] = new Field("stringField", DataTypes.STRING); + fields[1] = new Field("shortField", DataTypes.SHORT); + fields[2] = new Field("dateField", DataTypes.DATE); + fields[3] = new Field("timeField", DataTypes.TIMESTAMP); + fields[4] = new Field("varcharField", DataTypes.VARCHAR); + fields[5] = new Field("arrayFieldDate", DataTypes.createArrayType(DataTypes.DATE)); + fields[6] = new Field("arrayFieldTimestamp", DataTypes.createArrayType(DataTypes.TIMESTAMP)); + Map<String, String> map = new HashMap<>(); + map.put("complex_delimiter_level_1", "#"); + CarbonWriter writer = CarbonWriter.builder() + .outputPath(path) + .withLoadOptions(map) + .withCsvInput(new Schema(fields)) + .writtenBy("CarbonReaderTest") + .build(); + + for (int i = 0; i < 10; i++) { + String[] row2 = new String[]{ + "robot" + (i % 10), + String.valueOf(i % 10000), + "2019-03-02", + "2019-02-12 03:03:34", + "varchar", + "2019-03-02#2019-03-03#2019-03-04#2019-03-05", + "2019-02-12 03:03:34#2019-02-12 03:03:38#2019-02-12 03:03:41#2019-02-12 03:12:34" + }; + writer.write(row2); + } + writer.close(); + File[] dataFiles = new File(path).listFiles(new FilenameFilter() { + @Override + public boolean accept(File dir, String name) { + if (name == null) { + return false; + } + return name.endsWith("carbondata"); + } + }); + if (dataFiles == null || dataFiles.length < 1) { + throw new RuntimeException("Carbon data file not exists."); + } + Schema schema = CarbonSchemaReader + .readSchema(dataFiles[0].getAbsolutePath()) + .asOriginOrder(); + // Transform the schema + String[] strings = new String[schema.getFields().length]; + for (int i = 0; i < schema.getFields().length; i++) { + strings[i] = (schema.getFields())[i].getFieldName(); + } + // Read data + CarbonReader reader = CarbonReader + .builder(path) + .projection(strings) + .build(); + + int i = 0; + while (reader.hasNext()) { + Object[] row = (Object[]) reader.readNextRow(); + assert (row[0].equals("robot" + i)); + assert (row[2].equals("2019-03-02")); + assert (row[3].equals("2019-02-12 03:03:34")); + Object[] arrDate = (Object[]) row[5]; + assert (arrDate[0].equals("2019-03-02")); + assert (arrDate[1].equals("2019-03-03")); + assert (arrDate[2].equals("2019-03-04")); + assert (arrDate[3].equals("2019-03-05")); + Object[] arrTimestamp = (Object[]) row[6]; + assert (arrTimestamp[0].equals("2019-02-12 03:03:34")); + assert (arrTimestamp[1].equals("2019-02-12 03:03:38")); + assert (arrTimestamp[2].equals("2019-02-12 03:03:41")); + assert (arrTimestamp[3].equals("2019-02-12 03:12:34")); + i++; + } + Assert.assertEquals(i, 10); + reader.close(); + FileUtils.deleteDirectory(new File(path)); + } catch (Throwable e) { + e.printStackTrace(); + Assert.fail(e.getMessage()); + } + } + + @Test public void testReadDateAndTimestampColumnInStruct() Review comment: can you add a testcase with Map type having Date/Timestamp as key/value ########## File path: sdk/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ########## @@ -1741,6 +1742,246 @@ public boolean accept(File dir, String name) { } } + @Test + public void testReadDateAndTimestampColumnInArray() { + String path = "./testWriteFiles"; + try { + FileUtils.deleteDirectory(new File(path)); + Field[] fields = new Field[11]; + fields[0] = new Field("stringField", DataTypes.STRING); + fields[1] = new Field("shortField", DataTypes.SHORT); + fields[2] = new Field("dateField", DataTypes.DATE); + fields[3] = new Field("timeField", DataTypes.TIMESTAMP); + fields[4] = new Field("varcharField", DataTypes.VARCHAR); + fields[5] = new Field("arrayFieldDate", DataTypes.createArrayType(DataTypes.DATE)); + fields[6] = new Field("arrayFieldTimestamp", DataTypes.createArrayType(DataTypes.TIMESTAMP)); + Map<String, String> map = new HashMap<>(); + map.put("complex_delimiter_level_1", "#"); + CarbonWriter writer = CarbonWriter.builder() + .outputPath(path) + .withLoadOptions(map) + .withCsvInput(new Schema(fields)) + .writtenBy("CarbonReaderTest") + .build(); + + for (int i = 0; i < 10; i++) { + String[] row2 = new String[]{ + "robot" + (i % 10), + String.valueOf(i % 10000), + "2019-03-02", + "2019-02-12 03:03:34", + "varchar", + "2019-03-02#2019-03-03#2019-03-04#2019-03-05", + "2019-02-12 03:03:34#2019-02-12 03:03:38#2019-02-12 03:03:41#2019-02-12 03:12:34" + }; + writer.write(row2); + } + writer.close(); + File[] dataFiles = new File(path).listFiles(new FilenameFilter() { + @Override + public boolean accept(File dir, String name) { + if (name == null) { + return false; + } + return name.endsWith("carbondata"); + } + }); + if (dataFiles == null || dataFiles.length < 1) { + throw new RuntimeException("Carbon data file not exists."); + } + Schema schema = CarbonSchemaReader + .readSchema(dataFiles[0].getAbsolutePath()) + .asOriginOrder(); + // Transform the schema + String[] strings = new String[schema.getFields().length]; + for (int i = 0; i < schema.getFields().length; i++) { + strings[i] = (schema.getFields())[i].getFieldName(); + } + // Read data + CarbonReader reader = CarbonReader + .builder(path) + .projection(strings) + .build(); + + int i = 0; + while (reader.hasNext()) { + Object[] row = (Object[]) reader.readNextRow(); + assert (row[0].equals("robot" + i)); + assert (row[2].equals("2019-03-02")); + assert (row[3].equals("2019-02-12 03:03:34")); + Object[] arrDate = (Object[]) row[5]; + assert (arrDate[0].equals("2019-03-02")); + assert (arrDate[1].equals("2019-03-03")); + assert (arrDate[2].equals("2019-03-04")); + assert (arrDate[3].equals("2019-03-05")); + Object[] arrTimestamp = (Object[]) row[6]; + assert (arrTimestamp[0].equals("2019-02-12 03:03:34")); + assert (arrTimestamp[1].equals("2019-02-12 03:03:38")); + assert (arrTimestamp[2].equals("2019-02-12 03:03:41")); + assert (arrTimestamp[3].equals("2019-02-12 03:12:34")); + i++; + } + Assert.assertEquals(i, 10); + reader.close(); + FileUtils.deleteDirectory(new File(path)); + } catch (Throwable e) { + e.printStackTrace(); + Assert.fail(e.getMessage()); + } + } + + @Test public void testReadDateAndTimestampColumnInStruct() + throws IOException, InvalidLoadOptionException, InterruptedException { Review comment: Remove Exception if not used ########## File path: sdk/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ########## @@ -1741,6 +1742,246 @@ public boolean accept(File dir, String name) { } } + @Test + public void testReadDateAndTimestampColumnInArray() { + String path = "./testWriteFiles"; + try { + FileUtils.deleteDirectory(new File(path)); + Field[] fields = new Field[11]; + fields[0] = new Field("stringField", DataTypes.STRING); + fields[1] = new Field("shortField", DataTypes.SHORT); + fields[2] = new Field("dateField", DataTypes.DATE); + fields[3] = new Field("timeField", DataTypes.TIMESTAMP); + fields[4] = new Field("varcharField", DataTypes.VARCHAR); + fields[5] = new Field("arrayFieldDate", DataTypes.createArrayType(DataTypes.DATE)); + fields[6] = new Field("arrayFieldTimestamp", DataTypes.createArrayType(DataTypes.TIMESTAMP)); + Map<String, String> map = new HashMap<>(); + map.put("complex_delimiter_level_1", "#"); + CarbonWriter writer = CarbonWriter.builder() + .outputPath(path) + .withLoadOptions(map) + .withCsvInput(new Schema(fields)) + .writtenBy("CarbonReaderTest") + .build(); + + for (int i = 0; i < 10; i++) { + String[] row2 = new String[]{ + "robot" + (i % 10), + String.valueOf(i % 10000), + "2019-03-02", + "2019-02-12 03:03:34", + "varchar", + "2019-03-02#2019-03-03#2019-03-04#2019-03-05", + "2019-02-12 03:03:34#2019-02-12 03:03:38#2019-02-12 03:03:41#2019-02-12 03:12:34" + }; + writer.write(row2); + } + writer.close(); + File[] dataFiles = new File(path).listFiles(new FilenameFilter() { + @Override + public boolean accept(File dir, String name) { + if (name == null) { + return false; + } + return name.endsWith("carbondata"); + } + }); + if (dataFiles == null || dataFiles.length < 1) { + throw new RuntimeException("Carbon data file not exists."); + } + Schema schema = CarbonSchemaReader + .readSchema(dataFiles[0].getAbsolutePath()) + .asOriginOrder(); + // Transform the schema + String[] strings = new String[schema.getFields().length]; + for (int i = 0; i < schema.getFields().length; i++) { + strings[i] = (schema.getFields())[i].getFieldName(); + } + // Read data + CarbonReader reader = CarbonReader + .builder(path) + .projection(strings) + .build(); + + int i = 0; + while (reader.hasNext()) { + Object[] row = (Object[]) reader.readNextRow(); + assert (row[0].equals("robot" + i)); + assert (row[2].equals("2019-03-02")); + assert (row[3].equals("2019-02-12 03:03:34")); + Object[] arrDate = (Object[]) row[5]; + assert (arrDate[0].equals("2019-03-02")); + assert (arrDate[1].equals("2019-03-03")); + assert (arrDate[2].equals("2019-03-04")); + assert (arrDate[3].equals("2019-03-05")); + Object[] arrTimestamp = (Object[]) row[6]; + assert (arrTimestamp[0].equals("2019-02-12 03:03:34")); + assert (arrTimestamp[1].equals("2019-02-12 03:03:38")); + assert (arrTimestamp[2].equals("2019-02-12 03:03:41")); + assert (arrTimestamp[3].equals("2019-02-12 03:12:34")); + i++; + } + Assert.assertEquals(i, 10); + reader.close(); + FileUtils.deleteDirectory(new File(path)); + } catch (Throwable e) { + e.printStackTrace(); + Assert.fail(e.getMessage()); + } + } + + @Test public void testReadDateAndTimestampColumnInStruct() + throws IOException, InvalidLoadOptionException, InterruptedException { + String path = "./testWriteFiles"; + try { + FileUtils.deleteDirectory(new File(path)); + Field[] fields = new Field[3]; + fields[0] = new Field("name", DataTypes.STRING); + fields[1] = new Field("age", DataTypes.INT); + ArrayList<StructField> structFields = new ArrayList<>(); + structFields.add(new StructField("dateField", DataTypes.DATE)); + structFields.add(new StructField("timestampField", DataTypes.TIMESTAMP)); + fields[2] = new Field("structField", DataTypes.createStructType(structFields)); + Map<String, String> map = new HashMap<>(); + map.put("complex_delimiter_level_1", "#"); + CarbonWriter writer = CarbonWriter.builder() + .outputPath(path) + .withLoadOptions(map) + .withCsvInput(new Schema(fields)) + .writtenBy("CarbonReaderTest") + .build(); + + for (int i = 0; i < 10; i++) { + String[] row2 = new String[]{ + "robot" + (i % 10), + String.valueOf(i % 10000), + "2019-03-02#2019-02-12 03:12:34" + }; + writer.write(row2); + } + writer.close(); + File[] dataFiles = new File(path).listFiles(new FilenameFilter() { + @Override + public boolean accept(File dir, String name) { + if (name == null) { + return false; + } + return name.endsWith("carbondata"); + } + }); + if (dataFiles == null || dataFiles.length < 1) { + throw new RuntimeException("Carbon data file not exists."); + } + Schema schema = CarbonSchemaReader + .readSchema(dataFiles[0].getAbsolutePath()) + .asOriginOrder(); + // Transform the schema + String[] strings = new String[schema.getFields().length]; + for (int i = 0; i < schema.getFields().length; i++) { + strings[i] = (schema.getFields())[i].getFieldName(); + } + // Read data + CarbonReader reader = CarbonReader + .builder(path) + .projection(strings) + .build(); + int i = 0; + while (reader.hasNext()) { + Object[] row = (Object[]) reader.readNextRow(); + assert (row[0].equals("robot" + i)); + Object[] arr = (Object[]) row[2]; + assert (arr[0].equals("2019-03-02")); + assert (arr[1].equals("2019-02-12 03:12:34")); + i++; + } + Assert.assertEquals(i, 10); + reader.close(); + FileUtils.deleteDirectory(new File(path)); + } catch (Throwable e) { + e.printStackTrace(); + Assert.fail(e.getMessage()); + } + } + + @Test + public void testReadingDateAndTimestampColumnInArrayOfStruct() throws IOException { + String path = "./testWriteFilesArrayStruct"; + FileUtils.deleteDirectory(new File(path)); + Field[] fields = new Field[4]; + fields[0] = new Field("id", DataTypes.STRING); + fields[1] = new Field("source", DataTypes.STRING); + fields[2] = new Field("usage", DataTypes.STRING); + List<StructField> structFieldsList = new ArrayList<>(); + structFieldsList.add(new StructField("name", DataTypes.STRING)); + structFieldsList.add(new StructField("type", DataTypes.STRING)); + structFieldsList.add(new StructField("creation-date", DataTypes.DATE)); + structFieldsList.add(new StructField("creation-timestamp", DataTypes.TIMESTAMP)); + StructField structTypeByList = + new StructField("annotation", DataTypes.createStructType(structFieldsList), structFieldsList); + List<StructField> list = new ArrayList<>(); + list.add(structTypeByList); + Field arrayType = new Field("annotations", "array", list); + fields[3] = arrayType; + try { + CarbonWriterBuilder builder = CarbonWriter.builder().outputPath(path); + CarbonWriter writer = builder.withCsvInput(new Schema(fields)) + .writtenBy("complexTest") + .build(); + for (int i = 0; i < 15; i++) { + String[] row = new String[]{ + "robot" + i, + String.valueOf(i), + i + "." + i, + "sunflowers" + (i % 10) + "\002" + "modelarts/image_classification" + "\002" + "2019-03-30" + "\002" + "2019-03-30 17:22:31" + + "\001" + + "roses" + (i % 10) + "\002" + "modelarts/image_classification" + "\002" + "2019-03-30" + "\002" + "2019-03-30 17:22:31"}; + writer.write(row); + } + writer.close(); + } catch (Exception e) { + e.printStackTrace(); + Assert.fail(); + } + Schema schema = CarbonSchemaReader + .readSchema(path) + .asOriginOrder(); + assert (4 == schema.getFieldsLength()); + Field[] fields1 = schema.getFields(); Review comment: Remove variable if not used ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Karan980 commented on a change in pull request #4046: URL: https://github.com/apache/carbondata/pull/4046#discussion_r543064324 ########## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ########## @@ -126,6 +127,47 @@ public T readNextRow() throws IOException, InterruptedException { return formatDateAndTimeStamp((Object []) row); } + public Object getFormattedData(CarbonDimension dimension, Object row, SimpleDateFormat dateFormat, Review comment: Done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Karan980 commented on a change in pull request #4046: URL: https://github.com/apache/carbondata/pull/4046#discussion_r543064547 ########## File path: sdk/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ########## @@ -1741,6 +1742,246 @@ public boolean accept(File dir, String name) { } } + @Test + public void testReadDateAndTimestampColumnInArray() { + String path = "./testWriteFiles"; + try { + FileUtils.deleteDirectory(new File(path)); + Field[] fields = new Field[11]; + fields[0] = new Field("stringField", DataTypes.STRING); + fields[1] = new Field("shortField", DataTypes.SHORT); + fields[2] = new Field("dateField", DataTypes.DATE); + fields[3] = new Field("timeField", DataTypes.TIMESTAMP); + fields[4] = new Field("varcharField", DataTypes.VARCHAR); + fields[5] = new Field("arrayFieldDate", DataTypes.createArrayType(DataTypes.DATE)); + fields[6] = new Field("arrayFieldTimestamp", DataTypes.createArrayType(DataTypes.TIMESTAMP)); + Map<String, String> map = new HashMap<>(); + map.put("complex_delimiter_level_1", "#"); + CarbonWriter writer = CarbonWriter.builder() + .outputPath(path) + .withLoadOptions(map) + .withCsvInput(new Schema(fields)) + .writtenBy("CarbonReaderTest") + .build(); + + for (int i = 0; i < 10; i++) { + String[] row2 = new String[]{ + "robot" + (i % 10), + String.valueOf(i % 10000), + "2019-03-02", + "2019-02-12 03:03:34", + "varchar", + "2019-03-02#2019-03-03#2019-03-04#2019-03-05", + "2019-02-12 03:03:34#2019-02-12 03:03:38#2019-02-12 03:03:41#2019-02-12 03:12:34" + }; + writer.write(row2); + } + writer.close(); + File[] dataFiles = new File(path).listFiles(new FilenameFilter() { + @Override + public boolean accept(File dir, String name) { + if (name == null) { + return false; + } + return name.endsWith("carbondata"); + } + }); + if (dataFiles == null || dataFiles.length < 1) { + throw new RuntimeException("Carbon data file not exists."); + } + Schema schema = CarbonSchemaReader + .readSchema(dataFiles[0].getAbsolutePath()) + .asOriginOrder(); + // Transform the schema + String[] strings = new String[schema.getFields().length]; + for (int i = 0; i < schema.getFields().length; i++) { + strings[i] = (schema.getFields())[i].getFieldName(); + } + // Read data + CarbonReader reader = CarbonReader + .builder(path) + .projection(strings) + .build(); + + int i = 0; + while (reader.hasNext()) { + Object[] row = (Object[]) reader.readNextRow(); + assert (row[0].equals("robot" + i)); + assert (row[2].equals("2019-03-02")); + assert (row[3].equals("2019-02-12 03:03:34")); + Object[] arrDate = (Object[]) row[5]; + assert (arrDate[0].equals("2019-03-02")); + assert (arrDate[1].equals("2019-03-03")); + assert (arrDate[2].equals("2019-03-04")); + assert (arrDate[3].equals("2019-03-05")); + Object[] arrTimestamp = (Object[]) row[6]; + assert (arrTimestamp[0].equals("2019-02-12 03:03:34")); + assert (arrTimestamp[1].equals("2019-02-12 03:03:38")); + assert (arrTimestamp[2].equals("2019-02-12 03:03:41")); + assert (arrTimestamp[3].equals("2019-02-12 03:12:34")); + i++; + } + Assert.assertEquals(i, 10); + reader.close(); + FileUtils.deleteDirectory(new File(path)); + } catch (Throwable e) { + e.printStackTrace(); + Assert.fail(e.getMessage()); + } + } + + @Test public void testReadDateAndTimestampColumnInStruct() + throws IOException, InvalidLoadOptionException, InterruptedException { Review comment: Done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Karan980 commented on a change in pull request #4046: URL: https://github.com/apache/carbondata/pull/4046#discussion_r543064576 ########## File path: sdk/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ########## @@ -1741,6 +1742,246 @@ public boolean accept(File dir, String name) { } } + @Test + public void testReadDateAndTimestampColumnInArray() { + String path = "./testWriteFiles"; + try { + FileUtils.deleteDirectory(new File(path)); + Field[] fields = new Field[11]; + fields[0] = new Field("stringField", DataTypes.STRING); + fields[1] = new Field("shortField", DataTypes.SHORT); + fields[2] = new Field("dateField", DataTypes.DATE); + fields[3] = new Field("timeField", DataTypes.TIMESTAMP); + fields[4] = new Field("varcharField", DataTypes.VARCHAR); + fields[5] = new Field("arrayFieldDate", DataTypes.createArrayType(DataTypes.DATE)); + fields[6] = new Field("arrayFieldTimestamp", DataTypes.createArrayType(DataTypes.TIMESTAMP)); + Map<String, String> map = new HashMap<>(); + map.put("complex_delimiter_level_1", "#"); + CarbonWriter writer = CarbonWriter.builder() + .outputPath(path) + .withLoadOptions(map) + .withCsvInput(new Schema(fields)) + .writtenBy("CarbonReaderTest") + .build(); + + for (int i = 0; i < 10; i++) { + String[] row2 = new String[]{ + "robot" + (i % 10), + String.valueOf(i % 10000), + "2019-03-02", + "2019-02-12 03:03:34", + "varchar", + "2019-03-02#2019-03-03#2019-03-04#2019-03-05", + "2019-02-12 03:03:34#2019-02-12 03:03:38#2019-02-12 03:03:41#2019-02-12 03:12:34" + }; + writer.write(row2); + } + writer.close(); + File[] dataFiles = new File(path).listFiles(new FilenameFilter() { + @Override + public boolean accept(File dir, String name) { + if (name == null) { + return false; + } + return name.endsWith("carbondata"); + } + }); + if (dataFiles == null || dataFiles.length < 1) { + throw new RuntimeException("Carbon data file not exists."); + } + Schema schema = CarbonSchemaReader + .readSchema(dataFiles[0].getAbsolutePath()) + .asOriginOrder(); + // Transform the schema + String[] strings = new String[schema.getFields().length]; + for (int i = 0; i < schema.getFields().length; i++) { + strings[i] = (schema.getFields())[i].getFieldName(); + } + // Read data + CarbonReader reader = CarbonReader + .builder(path) + .projection(strings) + .build(); + + int i = 0; + while (reader.hasNext()) { + Object[] row = (Object[]) reader.readNextRow(); + assert (row[0].equals("robot" + i)); + assert (row[2].equals("2019-03-02")); + assert (row[3].equals("2019-02-12 03:03:34")); + Object[] arrDate = (Object[]) row[5]; + assert (arrDate[0].equals("2019-03-02")); + assert (arrDate[1].equals("2019-03-03")); + assert (arrDate[2].equals("2019-03-04")); + assert (arrDate[3].equals("2019-03-05")); + Object[] arrTimestamp = (Object[]) row[6]; + assert (arrTimestamp[0].equals("2019-02-12 03:03:34")); + assert (arrTimestamp[1].equals("2019-02-12 03:03:38")); + assert (arrTimestamp[2].equals("2019-02-12 03:03:41")); + assert (arrTimestamp[3].equals("2019-02-12 03:12:34")); + i++; + } + Assert.assertEquals(i, 10); + reader.close(); + FileUtils.deleteDirectory(new File(path)); + } catch (Throwable e) { + e.printStackTrace(); + Assert.fail(e.getMessage()); + } + } + + @Test public void testReadDateAndTimestampColumnInStruct() + throws IOException, InvalidLoadOptionException, InterruptedException { + String path = "./testWriteFiles"; + try { + FileUtils.deleteDirectory(new File(path)); + Field[] fields = new Field[3]; + fields[0] = new Field("name", DataTypes.STRING); + fields[1] = new Field("age", DataTypes.INT); + ArrayList<StructField> structFields = new ArrayList<>(); + structFields.add(new StructField("dateField", DataTypes.DATE)); + structFields.add(new StructField("timestampField", DataTypes.TIMESTAMP)); + fields[2] = new Field("structField", DataTypes.createStructType(structFields)); + Map<String, String> map = new HashMap<>(); + map.put("complex_delimiter_level_1", "#"); + CarbonWriter writer = CarbonWriter.builder() + .outputPath(path) + .withLoadOptions(map) + .withCsvInput(new Schema(fields)) + .writtenBy("CarbonReaderTest") + .build(); + + for (int i = 0; i < 10; i++) { + String[] row2 = new String[]{ + "robot" + (i % 10), + String.valueOf(i % 10000), + "2019-03-02#2019-02-12 03:12:34" + }; + writer.write(row2); + } + writer.close(); + File[] dataFiles = new File(path).listFiles(new FilenameFilter() { + @Override + public boolean accept(File dir, String name) { + if (name == null) { + return false; + } + return name.endsWith("carbondata"); + } + }); + if (dataFiles == null || dataFiles.length < 1) { + throw new RuntimeException("Carbon data file not exists."); + } + Schema schema = CarbonSchemaReader + .readSchema(dataFiles[0].getAbsolutePath()) + .asOriginOrder(); + // Transform the schema + String[] strings = new String[schema.getFields().length]; + for (int i = 0; i < schema.getFields().length; i++) { + strings[i] = (schema.getFields())[i].getFieldName(); + } + // Read data + CarbonReader reader = CarbonReader + .builder(path) + .projection(strings) + .build(); + int i = 0; + while (reader.hasNext()) { + Object[] row = (Object[]) reader.readNextRow(); + assert (row[0].equals("robot" + i)); + Object[] arr = (Object[]) row[2]; + assert (arr[0].equals("2019-03-02")); + assert (arr[1].equals("2019-02-12 03:12:34")); + i++; + } + Assert.assertEquals(i, 10); + reader.close(); + FileUtils.deleteDirectory(new File(path)); + } catch (Throwable e) { + e.printStackTrace(); + Assert.fail(e.getMessage()); + } + } + + @Test + public void testReadingDateAndTimestampColumnInArrayOfStruct() throws IOException { + String path = "./testWriteFilesArrayStruct"; + FileUtils.deleteDirectory(new File(path)); + Field[] fields = new Field[4]; + fields[0] = new Field("id", DataTypes.STRING); + fields[1] = new Field("source", DataTypes.STRING); + fields[2] = new Field("usage", DataTypes.STRING); + List<StructField> structFieldsList = new ArrayList<>(); + structFieldsList.add(new StructField("name", DataTypes.STRING)); + structFieldsList.add(new StructField("type", DataTypes.STRING)); + structFieldsList.add(new StructField("creation-date", DataTypes.DATE)); + structFieldsList.add(new StructField("creation-timestamp", DataTypes.TIMESTAMP)); + StructField structTypeByList = + new StructField("annotation", DataTypes.createStructType(structFieldsList), structFieldsList); + List<StructField> list = new ArrayList<>(); + list.add(structTypeByList); + Field arrayType = new Field("annotations", "array", list); + fields[3] = arrayType; + try { + CarbonWriterBuilder builder = CarbonWriter.builder().outputPath(path); + CarbonWriter writer = builder.withCsvInput(new Schema(fields)) + .writtenBy("complexTest") + .build(); + for (int i = 0; i < 15; i++) { + String[] row = new String[]{ + "robot" + i, + String.valueOf(i), + i + "." + i, + "sunflowers" + (i % 10) + "\002" + "modelarts/image_classification" + "\002" + "2019-03-30" + "\002" + "2019-03-30 17:22:31" + + "\001" + + "roses" + (i % 10) + "\002" + "modelarts/image_classification" + "\002" + "2019-03-30" + "\002" + "2019-03-30 17:22:31"}; + writer.write(row); + } + writer.close(); + } catch (Exception e) { + e.printStackTrace(); + Assert.fail(); + } + Schema schema = CarbonSchemaReader + .readSchema(path) + .asOriginOrder(); + assert (4 == schema.getFieldsLength()); + Field[] fields1 = schema.getFields(); Review comment: Done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4046: URL: https://github.com/apache/carbondata/pull/4046#issuecomment-745107047 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5162/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4046: URL: https://github.com/apache/carbondata/pull/4046#issuecomment-745109211 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3400/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Karan980 commented on a change in pull request #4046: URL: https://github.com/apache/carbondata/pull/4046#discussion_r544167012 ########## File path: sdk/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ########## @@ -1741,6 +1742,246 @@ public boolean accept(File dir, String name) { } } + @Test + public void testReadDateAndTimestampColumnInArray() { + String path = "./testWriteFiles"; + try { + FileUtils.deleteDirectory(new File(path)); + Field[] fields = new Field[11]; + fields[0] = new Field("stringField", DataTypes.STRING); + fields[1] = new Field("shortField", DataTypes.SHORT); + fields[2] = new Field("dateField", DataTypes.DATE); + fields[3] = new Field("timeField", DataTypes.TIMESTAMP); + fields[4] = new Field("varcharField", DataTypes.VARCHAR); + fields[5] = new Field("arrayFieldDate", DataTypes.createArrayType(DataTypes.DATE)); + fields[6] = new Field("arrayFieldTimestamp", DataTypes.createArrayType(DataTypes.TIMESTAMP)); + Map<String, String> map = new HashMap<>(); + map.put("complex_delimiter_level_1", "#"); + CarbonWriter writer = CarbonWriter.builder() + .outputPath(path) + .withLoadOptions(map) + .withCsvInput(new Schema(fields)) + .writtenBy("CarbonReaderTest") + .build(); + + for (int i = 0; i < 10; i++) { + String[] row2 = new String[]{ + "robot" + (i % 10), + String.valueOf(i % 10000), + "2019-03-02", + "2019-02-12 03:03:34", + "varchar", + "2019-03-02#2019-03-03#2019-03-04#2019-03-05", + "2019-02-12 03:03:34#2019-02-12 03:03:38#2019-02-12 03:03:41#2019-02-12 03:12:34" + }; + writer.write(row2); + } + writer.close(); + File[] dataFiles = new File(path).listFiles(new FilenameFilter() { + @Override + public boolean accept(File dir, String name) { + if (name == null) { + return false; + } + return name.endsWith("carbondata"); + } + }); + if (dataFiles == null || dataFiles.length < 1) { + throw new RuntimeException("Carbon data file not exists."); + } + Schema schema = CarbonSchemaReader + .readSchema(dataFiles[0].getAbsolutePath()) + .asOriginOrder(); + // Transform the schema + String[] strings = new String[schema.getFields().length]; + for (int i = 0; i < schema.getFields().length; i++) { + strings[i] = (schema.getFields())[i].getFieldName(); + } + // Read data + CarbonReader reader = CarbonReader + .builder(path) + .projection(strings) + .build(); + + int i = 0; + while (reader.hasNext()) { + Object[] row = (Object[]) reader.readNextRow(); + assert (row[0].equals("robot" + i)); + assert (row[2].equals("2019-03-02")); + assert (row[3].equals("2019-02-12 03:03:34")); + Object[] arrDate = (Object[]) row[5]; + assert (arrDate[0].equals("2019-03-02")); + assert (arrDate[1].equals("2019-03-03")); + assert (arrDate[2].equals("2019-03-04")); + assert (arrDate[3].equals("2019-03-05")); + Object[] arrTimestamp = (Object[]) row[6]; + assert (arrTimestamp[0].equals("2019-02-12 03:03:34")); + assert (arrTimestamp[1].equals("2019-02-12 03:03:38")); + assert (arrTimestamp[2].equals("2019-02-12 03:03:41")); + assert (arrTimestamp[3].equals("2019-02-12 03:12:34")); + i++; + } + Assert.assertEquals(i, 10); + reader.close(); + FileUtils.deleteDirectory(new File(path)); + } catch (Throwable e) { + e.printStackTrace(); + Assert.fail(e.getMessage()); + } + } + + @Test public void testReadDateAndTimestampColumnInStruct() Review comment: Done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4046: URL: https://github.com/apache/carbondata/pull/4046#issuecomment-746149611 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5181/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4046: URL: https://github.com/apache/carbondata/pull/4046#issuecomment-746151687 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3419/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #4046: URL: https://github.com/apache/carbondata/pull/4046#discussion_r544281555 ########## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ########## @@ -126,6 +127,81 @@ public T readNextRow() throws IOException, InterruptedException { return formatDateAndTimeStamp((Object []) row); } + /** + * This method converts the date and timestamp columns into right format. Before conversion date + * is present as integer and timestamp is present as long. This method also flattens complex + * columns and format the date/timestamp child present in them. + * + * @param dimension Review comment: Please define the params or else remove it ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Karan980 commented on a change in pull request #4046: URL: https://github.com/apache/carbondata/pull/4046#discussion_r544323184 ########## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ########## @@ -126,6 +127,81 @@ public T readNextRow() throws IOException, InterruptedException { return formatDateAndTimeStamp((Object []) row); } + /** + * This method converts the date and timestamp columns into right format. Before conversion date + * is present as integer and timestamp is present as long. This method also flattens complex + * columns and format the date/timestamp child present in them. + * + * @param dimension Review comment: Done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA2 commented on pull request #4046: URL: https://github.com/apache/carbondata/pull/4046#issuecomment-746503164 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5186/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |