GitHub user xubo245 opened a pull request:
https://github.com/apache/carbondata/pull/2780 Carbondata 2982 support array string in schema Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/carbondata CARBONDATA-2982_supportArrayStringInSchema Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2780.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2780 ---- commit d3ed337344d6af19398bf867ea24a98c36f0170d Author: xubo245 <xubo29@...> Date: 2018-09-28T03:47:22Z [CARBONDATA-2982] CarbonSchemaReader support array<string> commit 7110a8bf5b2a55f9b2444366d291ffdc5f172585 Author: xubo245 <xubo29@...> Date: 2018-09-28T04:01:47Z optimize ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2780 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/620/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2780 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8881/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2780 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/812/ --- |
In reply to this post by qiuchenjian-2
Github user xubo245 commented on the issue:
https://github.com/apache/carbondata/pull/2780 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2780 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/623/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2780 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8884/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2780 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/815/ --- |
In reply to this post by qiuchenjian-2
Github user KanakaKumar commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2780#discussion_r221246385 --- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java --- @@ -1435,5 +1435,99 @@ public void testReadWithFilterOfnonTransactionalwithsubfolders() throws IOExcept FileUtils.deleteDirectory(new File("./testWriteFiles")); } + @Test + public void testReadSchemaFromDataFileArrayString() { + String path = "./testWriteFiles"; + try { + FileUtils.deleteDirectory(new File(path)); + + Field[] fields = new Field[11]; + fields[0] = new Field("stringField", DataTypes.STRING); + fields[1] = new Field("shortField", DataTypes.SHORT); + fields[2] = new Field("intField", DataTypes.INT); + fields[3] = new Field("longField", DataTypes.LONG); + fields[4] = new Field("doubleField", DataTypes.DOUBLE); + fields[5] = new Field("boolField", DataTypes.BOOLEAN); + fields[6] = new Field("dateField", DataTypes.DATE); + fields[7] = new Field("timeField", DataTypes.TIMESTAMP); + fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 2)); + fields[9] = new Field("varcharField", DataTypes.VARCHAR); + fields[10] = new Field("arrayField", DataTypes.createArrayType(DataTypes.STRING)); + Map<String, String> map = new HashMap<>(); + map.put("complex_delimiter_level_1", "#"); + CarbonWriter writer = CarbonWriter.builder() + .outputPath(path) + .withLoadOptions(map) + .withCsvInput(new Schema(fields)).build(); + + for (int i = 0; i < 10; i++) { + String[] row2 = new String[]{ + "robot" + (i % 10), + String.valueOf(i % 10000), + String.valueOf(i), + String.valueOf(Long.MAX_VALUE - i), + String.valueOf((double) i / 2), + String.valueOf(true), + "2019-03-02", + "2019-02-12 03:03:34", + "12.345", + "varchar", + "Hello#World#From#Carbon" + }; + writer.write(row2); + } + writer.close(); + File[] dataFiles = new File(path).listFiles(new FilenameFilter() { + @Override + public boolean accept(File dir, String name) { + if (name == null) { + return false; + } + return name.endsWith("carbondata"); + } + }); + if (dataFiles == null || dataFiles.length < 1) { + throw new RuntimeException("Carbon index file not exists."); + } + Schema schema = CarbonSchemaReader + .readSchemaInDataFile(dataFiles[0].getAbsolutePath()) + .asOriginOrder(); + // Transform the schema + String[] strings = new String[schema.getFields().length]; + for (int i = 0; i < schema.getFields().length; i++) { + strings[i] = (schema.getFields())[i].getFieldName(); + } + + // Read data + CarbonReader reader = CarbonReader + .builder(path, "_temp") + .projection(strings) + .build(); + + System.out.println("\nData:"); + long day = 24L * 3600 * 1000; + int i = 0; + while (reader.hasNext()) { + Object[] row = (Object[]) reader.readNextRow(); + System.out.println(String.format("%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t", + i, row[0], row[1], row[2], row[3], row[4], row[5], + new Date((day * ((int) row[6]))), new Timestamp((long) row[7] / 1000), + row[8], row[9] + )); + Object[] arr = (Object[]) row[10]; + for (int j = 0; j < arr.length; j++) { + System.out.print(arr[j] + " "); + } + System.out.println(); + i++; + } + System.out.println("\nFinished"); --- End diff -- Please remove Sys out content --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2780 LGTM except for minor test case comment by @KanakaKumar . --- |
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2780#discussion_r221411423 --- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java --- @@ -1435,5 +1435,99 @@ public void testReadWithFilterOfnonTransactionalwithsubfolders() throws IOExcept FileUtils.deleteDirectory(new File("./testWriteFiles")); } + @Test + public void testReadSchemaFromDataFileArrayString() { + String path = "./testWriteFiles"; + try { + FileUtils.deleteDirectory(new File(path)); + + Field[] fields = new Field[11]; + fields[0] = new Field("stringField", DataTypes.STRING); + fields[1] = new Field("shortField", DataTypes.SHORT); + fields[2] = new Field("intField", DataTypes.INT); + fields[3] = new Field("longField", DataTypes.LONG); + fields[4] = new Field("doubleField", DataTypes.DOUBLE); + fields[5] = new Field("boolField", DataTypes.BOOLEAN); + fields[6] = new Field("dateField", DataTypes.DATE); + fields[7] = new Field("timeField", DataTypes.TIMESTAMP); + fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 2)); + fields[9] = new Field("varcharField", DataTypes.VARCHAR); + fields[10] = new Field("arrayField", DataTypes.createArrayType(DataTypes.STRING)); + Map<String, String> map = new HashMap<>(); + map.put("complex_delimiter_level_1", "#"); + CarbonWriter writer = CarbonWriter.builder() + .outputPath(path) + .withLoadOptions(map) + .withCsvInput(new Schema(fields)).build(); + + for (int i = 0; i < 10; i++) { + String[] row2 = new String[]{ + "robot" + (i % 10), + String.valueOf(i % 10000), + String.valueOf(i), + String.valueOf(Long.MAX_VALUE - i), + String.valueOf((double) i / 2), + String.valueOf(true), + "2019-03-02", + "2019-02-12 03:03:34", + "12.345", + "varchar", + "Hello#World#From#Carbon" + }; + writer.write(row2); + } + writer.close(); + File[] dataFiles = new File(path).listFiles(new FilenameFilter() { + @Override + public boolean accept(File dir, String name) { + if (name == null) { + return false; + } + return name.endsWith("carbondata"); + } + }); + if (dataFiles == null || dataFiles.length < 1) { + throw new RuntimeException("Carbon index file not exists."); + } + Schema schema = CarbonSchemaReader + .readSchemaInDataFile(dataFiles[0].getAbsolutePath()) + .asOriginOrder(); + // Transform the schema + String[] strings = new String[schema.getFields().length]; + for (int i = 0; i < schema.getFields().length; i++) { + strings[i] = (schema.getFields())[i].getFieldName(); + } + + // Read data + CarbonReader reader = CarbonReader + .builder(path, "_temp") + .projection(strings) + .build(); + + System.out.println("\nData:"); + long day = 24L * 3600 * 1000; + int i = 0; + while (reader.hasNext()) { + Object[] row = (Object[]) reader.readNextRow(); + System.out.println(String.format("%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t", + i, row[0], row[1], row[2], row[3], row[4], row[5], + new Date((day * ((int) row[6]))), new Timestamp((long) row[7] / 1000), + row[8], row[9] + )); + Object[] arr = (Object[]) row[10]; + for (int j = 0; j < arr.length; j++) { + System.out.print(arr[j] + " "); + } + System.out.println(); + i++; + } + System.out.println("\nFinished"); --- End diff -- ok, done --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2780 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/647/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2780 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/842/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2780 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8910/ --- |
In reply to this post by qiuchenjian-2
Github user xubo245 commented on the issue:
https://github.com/apache/carbondata/pull/2780 @KanakaKumar @ajantha-bhat removed --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2780 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/651/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2780 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8914/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2780 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/846/ --- |
In reply to this post by qiuchenjian-2
Github user xubo245 commented on the issue:
https://github.com/apache/carbondata/pull/2780 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2780 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/655/ --- |
Free forum by Nabble | Edit this page |