Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2780: Carbondata 2982 support array string in schem...

Classic

List

28 messages Options

Options

12

[GitHub] carbondata pull request #2780: Carbondata 2982 support array string in schem...

GitHub user xubo245 opened a pull request:

https://github.com/apache/carbondata/pull/2780

Carbondata 2982 support array string in schema

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?

- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.

- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xubo245/carbondata CARBONDATA-2982_supportArrayStringInSchema

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2780.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2780

----
commit d3ed337344d6af19398bf867ea24a98c36f0170d
Author: xubo245 <xubo29@...>
Date: 2018-09-28T03:47:22Z

[CARBONDATA-2982] CarbonSchemaReader support array<string>

commit 7110a8bf5b2a55f9b2444366d291ffdc5f172585
Author: xubo245 <xubo29@...>
Date: 2018-09-28T04:01:47Z

optimize

----

---

[GitHub] carbondata issue #2780: [CARBONDATA-2982] CarbonSchemaReader support array<s...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2780

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/620/

---

[GitHub] carbondata issue #2780: [CARBONDATA-2982] CarbonSchemaReader support array<s...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2780

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8881/

---

[GitHub] carbondata issue #2780: [CARBONDATA-2982] CarbonSchemaReader support array<s...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2780

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/812/

---

[GitHub] carbondata issue #2780: [CARBONDATA-2982] CarbonSchemaReader support array<s...

In reply to this post by qiuchenjian-2

Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2780

retest this please

---

[GitHub] carbondata issue #2780: [CARBONDATA-2982] CarbonSchemaReader support array<s...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2780

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/623/

---

[GitHub] carbondata issue #2780: [CARBONDATA-2982] CarbonSchemaReader support array<s...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2780

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8884/

---

[GitHub] carbondata issue #2780: [CARBONDATA-2982] CarbonSchemaReader support array<s...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2780

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/815/

---

[GitHub] carbondata pull request #2780: [CARBONDATA-2982] CarbonSchemaReader support ...

In reply to this post by qiuchenjian-2

Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2780#discussion_r221246385

--- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ---
@@ -1435,5 +1435,99 @@ public void testReadWithFilterOfnonTransactionalwithsubfolders() throws IOExcept
FileUtils.deleteDirectory(new File("./testWriteFiles"));
}

+ @Test
+ public void testReadSchemaFromDataFileArrayString() {
+ String path = "./testWriteFiles";
+ try {
+ FileUtils.deleteDirectory(new File(path));
+
+ Field[] fields = new Field[11];
+ fields[0] = new Field("stringField", DataTypes.STRING);
+ fields[1] = new Field("shortField", DataTypes.SHORT);
+ fields[2] = new Field("intField", DataTypes.INT);
+ fields[3] = new Field("longField", DataTypes.LONG);
+ fields[4] = new Field("doubleField", DataTypes.DOUBLE);
+ fields[5] = new Field("boolField", DataTypes.BOOLEAN);
+ fields[6] = new Field("dateField", DataTypes.DATE);
+ fields[7] = new Field("timeField", DataTypes.TIMESTAMP);
+ fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 2));
+ fields[9] = new Field("varcharField", DataTypes.VARCHAR);
+ fields[10] = new Field("arrayField", DataTypes.createArrayType(DataTypes.STRING));
+ Map<String, String> map = new HashMap<>();
+ map.put("complex_delimiter_level_1", "#");
+ CarbonWriter writer = CarbonWriter.builder()
+ .outputPath(path)
+ .withLoadOptions(map)
+ .withCsvInput(new Schema(fields)).build();
+
+ for (int i = 0; i < 10; i++) {
+ String[] row2 = new String[]{
+ "robot" + (i % 10),
+ String.valueOf(i % 10000),
+ String.valueOf(i),
+ String.valueOf(Long.MAX_VALUE - i),
+ String.valueOf((double) i / 2),
+ String.valueOf(true),
+ "2019-03-02",
+ "2019-02-12 03:03:34",
+ "12.345",
+ "varchar",
+ "Hello#World#From#Carbon"
+ };
+ writer.write(row2);
+ }
+ writer.close();

+ File[] dataFiles = new File(path).listFiles(new FilenameFilter() {
+ @Override
+ public boolean accept(File dir, String name) {
+ if (name == null) {
+ return false;
+ }
+ return name.endsWith("carbondata");
+ }
+ });
+ if (dataFiles == null || dataFiles.length < 1) {
+ throw new RuntimeException("Carbon index file not exists.");
+ }
+ Schema schema = CarbonSchemaReader
+ .readSchemaInDataFile(dataFiles[0].getAbsolutePath())
+ .asOriginOrder();
+ // Transform the schema
+ String[] strings = new String[schema.getFields().length];
+ for (int i = 0; i < schema.getFields().length; i++) {
+ strings[i] = (schema.getFields())[i].getFieldName();
+ }
+
+ // Read data
+ CarbonReader reader = CarbonReader
+ .builder(path, "_temp")
+ .projection(strings)
+ .build();
+
+ System.out.println("\nData:");
+ long day = 24L * 3600 * 1000;
+ int i = 0;
+ while (reader.hasNext()) {
+ Object[] row = (Object[]) reader.readNextRow();
+ System.out.println(String.format("%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t",
+ i, row[0], row[1], row[2], row[3], row[4], row[5],
+ new Date((day * ((int) row[6]))), new Timestamp((long) row[7] / 1000),
+ row[8], row[9]
+ ));
+ Object[] arr = (Object[]) row[10];
+ for (int j = 0; j < arr.length; j++) {
+ System.out.print(arr[j] + " ");
+ }
+ System.out.println();
+ i++;
+ }
+ System.out.println("\nFinished");
--- End diff --

Please remove Sys out content

---

[GitHub] carbondata issue #2780: [CARBONDATA-2982] CarbonSchemaReader support array<s...

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2780

LGTM except for minor test case comment by @KanakaKumar .

---

[GitHub] carbondata pull request #2780: [CARBONDATA-2982] CarbonSchemaReader support ...

In reply to this post by qiuchenjian-2

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2780#discussion_r221411423

--- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ---
@@ -1435,5 +1435,99 @@ public void testReadWithFilterOfnonTransactionalwithsubfolders() throws IOExcept
FileUtils.deleteDirectory(new File("./testWriteFiles"));
}

+ @Test
+ public void testReadSchemaFromDataFileArrayString() {
+ String path = "./testWriteFiles";
+ try {
+ FileUtils.deleteDirectory(new File(path));
+
+ Field[] fields = new Field[11];
+ fields[0] = new Field("stringField", DataTypes.STRING);
+ fields[1] = new Field("shortField", DataTypes.SHORT);
+ fields[2] = new Field("intField", DataTypes.INT);
+ fields[3] = new Field("longField", DataTypes.LONG);
+ fields[4] = new Field("doubleField", DataTypes.DOUBLE);
+ fields[5] = new Field("boolField", DataTypes.BOOLEAN);
+ fields[6] = new Field("dateField", DataTypes.DATE);
+ fields[7] = new Field("timeField", DataTypes.TIMESTAMP);
+ fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 2));
+ fields[9] = new Field("varcharField", DataTypes.VARCHAR);
+ fields[10] = new Field("arrayField", DataTypes.createArrayType(DataTypes.STRING));
+ Map<String, String> map = new HashMap<>();
+ map.put("complex_delimiter_level_1", "#");
+ CarbonWriter writer = CarbonWriter.builder()
+ .outputPath(path)
+ .withLoadOptions(map)
+ .withCsvInput(new Schema(fields)).build();
+
+ for (int i = 0; i < 10; i++) {
+ String[] row2 = new String[]{
+ "robot" + (i % 10),
+ String.valueOf(i % 10000),
+ String.valueOf(i),
+ String.valueOf(Long.MAX_VALUE - i),
+ String.valueOf((double) i / 2),
+ String.valueOf(true),
+ "2019-03-02",
+ "2019-02-12 03:03:34",
+ "12.345",
+ "varchar",
+ "Hello#World#From#Carbon"
+ };
+ writer.write(row2);
+ }
+ writer.close();

+ File[] dataFiles = new File(path).listFiles(new FilenameFilter() {
+ @Override
+ public boolean accept(File dir, String name) {
+ if (name == null) {
+ return false;
+ }
+ return name.endsWith("carbondata");
+ }
+ });
+ if (dataFiles == null || dataFiles.length < 1) {
+ throw new RuntimeException("Carbon index file not exists.");
+ }
+ Schema schema = CarbonSchemaReader
+ .readSchemaInDataFile(dataFiles[0].getAbsolutePath())
+ .asOriginOrder();
+ // Transform the schema
+ String[] strings = new String[schema.getFields().length];
+ for (int i = 0; i < schema.getFields().length; i++) {
+ strings[i] = (schema.getFields())[i].getFieldName();
+ }
+
+ // Read data
+ CarbonReader reader = CarbonReader
+ .builder(path, "_temp")
+ .projection(strings)
+ .build();
+
+ System.out.println("\nData:");
+ long day = 24L * 3600 * 1000;
+ int i = 0;
+ while (reader.hasNext()) {
+ Object[] row = (Object[]) reader.readNextRow();
+ System.out.println(String.format("%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t",
+ i, row[0], row[1], row[2], row[3], row[4], row[5],
+ new Date((day * ((int) row[6]))), new Timestamp((long) row[7] / 1000),
+ row[8], row[9]
+ ));
+ Object[] arr = (Object[]) row[10];
+ for (int j = 0; j < arr.length; j++) {
+ System.out.print(arr[j] + " ");
+ }
+ System.out.println();
+ i++;
+ }
+ System.out.println("\nFinished");
--- End diff --

ok, done

---

[GitHub] carbondata issue #2780: [CARBONDATA-2982] CarbonSchemaReader support array<s...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2780

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/647/

---

[GitHub] carbondata issue #2780: [CARBONDATA-2982] CarbonSchemaReader support array<s...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2780

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/842/

---

[GitHub] carbondata issue #2780: [CARBONDATA-2982] CarbonSchemaReader support array<s...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2780

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8910/

---

[GitHub] carbondata issue #2780: [CARBONDATA-2982] CarbonSchemaReader support array<s...

In reply to this post by qiuchenjian-2

Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2780

@KanakaKumar @ajantha-bhat removed

---

[GitHub] carbondata issue #2780: [CARBONDATA-2982] CarbonSchemaReader support array<s...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2780

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/651/

---

[GitHub] carbondata issue #2780: [CARBONDATA-2982] CarbonSchemaReader support array<s...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2780

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8914/

---

[GitHub] carbondata issue #2780: [CARBONDATA-2982] CarbonSchemaReader support array<s...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2780

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/846/

---

[GitHub] carbondata issue #2780: [CARBONDATA-2982] CarbonSchemaReader support array<s...

In reply to this post by qiuchenjian-2

Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2780

retest this please

---

[GitHub] carbondata issue #2780: [CARBONDATA-2982] CarbonSchemaReader support array<s...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2780

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/655/

---

12