Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2816: [CARBONDATA-300] Suppor read batch row in CSD...

Classic

List

168 messages Options

Options

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r229161289

--- Diff: store/CSDK/test/main.cpp ---
@@ -220,6 +393,86 @@ bool tryCatchException(JNIEnv *env) {
*/
bool readFromS3(JNIEnv *env, char *argv[]) {
printf("\nRead data from S3:\n");
+ struct timeval start, build, read;
--- End diff --

ok, optimized, please check.

---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

In reply to this post by qiuchenjian-2

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r229161334

--- Diff: store/CSDK/test/main.cpp ---
@@ -220,6 +393,86 @@ bool tryCatchException(JNIEnv *env) {
*/
bool readFromS3(JNIEnv *env, char *argv[]) {
printf("\nRead data from S3:\n");
+ struct timeval start, build, read;
+ gettimeofday(&start, NULL);
+
+ CarbonReader reader;
+
+ char *args[3];
+ // "your access key"
+ args[0] = argv[1];
+ // "your secret key"
+ args[1] = argv[2];
+ // "your endPoint"
+ args[2] = argv[3];
+
+ reader.builder(env, "s3a://sdk/WriterOutput/carbondata", "test");
+ reader.withHadoopConf("fs.s3a.access.key", argv[1]);
+ reader.withHadoopConf("fs.s3a.secret.key", argv[2]);
+ reader.withHadoopConf("fs.s3a.endpoint", argv[3]);
+ reader.build();
+
+ gettimeofday(&build, NULL);
+ int time = 1000000 * (build.tv_sec - start.tv_sec) + build.tv_usec - start.tv_usec;
+ int buildTime = time / 1000000.0;
+ printf("build time: %lf s\n", time / 1000000.0);
+
+ CarbonRow carbonRow(env);
+ int i = 0;
+ while (reader.hasNext()) {
+ jobject row = reader.readNextRow();
+ i++;
+ carbonRow.setCarbonRow(row);
+
+ printf("%s\t", carbonRow.getString(0));
+ printf("%d\t", carbonRow.getInt(1));
+ printf("%ld\t", carbonRow.getLong(2));
+ printf("%s\t", carbonRow.getVarchar(3));
+ jobjectArray arr = carbonRow.getArray(4);
+ jsize length = env->GetArrayLength(arr);
+ int j = 0;
+ for (j = 0; j < length; j++) {
+ jobject element = env->GetObjectArrayElement(arr, j);
+ char *str = (char *) env->GetStringUTFChars((jstring) element, JNI_FALSE);
+ printf("%s\t", str);
+ }
+ env->DeleteLocalRef(arr);
+ printf("%d\t", carbonRow.getShort(5));
+ printf("%d\t", carbonRow.getInt(6));
+ printf("%ld\t", carbonRow.getLong(7));
+ printf("%lf\t", carbonRow.getDouble(8));
+ bool bool1 = carbonRow.getBoolean(9);
+ if (bool1) {
+ printf("true\t");
+ } else {
+ printf("false\t");
+ }
+ printf("%s\t", carbonRow.getDecimal(10));
+ printf("%f\t", carbonRow.getFloat(11));
+ printf("\n");
+ env->DeleteLocalRef(row);
+ }
+ gettimeofday(&read, NULL);
+ time = 1000000 * (read.tv_sec - start.tv_sec) + read.tv_usec - start.tv_usec;
+ printf("total lines is %d: build time: %lf, read time is %lf s, average speed is %lf records/s\n",
+ i, buildTime, time / 1000000.0, i / (time / 1000000.0));
+
+ reader.close();
+}
+
+/**
+ * read data from S3
+ * parameter is ak sk endpoint
+ *
+ * @param env jni env
+ * @param argv argument vector
+ * @return
+ */
+bool readFromS3ForBigData(JNIEnv *env, char **argv) {
--- End diff --

removed this test case

---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

In reply to this post by qiuchenjian-2

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r229161378

--- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ---
@@ -90,6 +93,20 @@ public T readNextRow() throws IOException, InterruptedException {
return currentReader.getCurrentValue();
}

+ /**
+ * Read and return next batch row objects
+ */
+ public Object[] readNextBatchRow() throws Exception {
+ validateReader();
+ int batch = Integer.parseInt(CarbonProperties.getInstance()
--- End diff --

ok, I added default batch size

---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

In reply to this post by qiuchenjian-2

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r229161840

--- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ---
@@ -90,6 +93,20 @@ public T readNextRow() throws IOException, InterruptedException {
return currentReader.getCurrentValue();
}

+ /**
+ * Read and return next batch row objects
+ */
+ public Object[] readNextBatchRow() throws Exception {
+ validateReader();
+ int batch = Integer.parseInt(CarbonProperties.getInstance()
--- End diff --

no need batch in here, I removed.

---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

In reply to this post by qiuchenjian-2

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r229162173

--- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ---
@@ -1737,4 +1739,95 @@ public void testReadNextRowWithProjectionAndRowUtil() {
}
}

+ @Test
+ public void testReadNextBatchRow() {
+ String path = "./carbondata";
+ try {
+ FileUtils.deleteDirectory(new File(path));
+
+ Field[] fields = new Field[12];
+ fields[0] = new Field("stringField", DataTypes.STRING);
+ fields[1] = new Field("shortField", DataTypes.SHORT);
+ fields[2] = new Field("intField", DataTypes.INT);
+ fields[3] = new Field("longField", DataTypes.LONG);
+ fields[4] = new Field("doubleField", DataTypes.DOUBLE);
+ fields[5] = new Field("boolField", DataTypes.BOOLEAN);
+ fields[6] = new Field("dateField", DataTypes.DATE);
+ fields[7] = new Field("timeField", DataTypes.TIMESTAMP);
+ fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 2));
+ fields[9] = new Field("varcharField", DataTypes.VARCHAR);
+ fields[10] = new Field("arrayField", DataTypes.createArrayType(DataTypes.STRING));
+ fields[11] = new Field("floatField", DataTypes.FLOAT);
+ Map<String, String> map = new HashMap<>();
+ map.put("complex_delimiter_level_1", "#");
+ CarbonWriter writer = CarbonWriter.builder()
+ .outputPath(path)
+ .withLoadOptions(map)
+ .withCsvInput(new Schema(fields))
+ .writtenBy("CarbonReaderTest")
+ .build();
+
+ for (int i = 0; i < 10; i++) {
+ String[] row2 = new String[]{
+ "robot" + (i % 10),
+ String.valueOf(i % 10000),
+ String.valueOf(i),
+ String.valueOf(Long.MAX_VALUE - i),
+ String.valueOf((double) i / 2),
+ String.valueOf(true),
+ "2019-03-02",
+ "2019-02-12 03:03:34",
+ "12.345",
+ "varchar",
+ "Hello#World#From#Carbon",
+ "1.23"
+ };
+ writer.write(row2);
+ }
+ writer.close();
+
+ // Read data
+ CarbonReader reader = CarbonReader
+ .builder(path, "_temp")
+ .withBatch(3)
+ .build();
+
+ int i = 0;
+ while (reader.hasNext()) {
+ Object[] batch = reader.readNextBatchRow();
+
+ for (int j = 0; j < batch.length; j++) {
--- End diff --

ok,done

---

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9394/

---

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1342/

---

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1130/

---

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1131/

---

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1343/

---

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9395/

---

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1132/

---

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1344/

---

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9396/

---

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

In reply to this post by qiuchenjian-2

Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2816

@ajantha-bhat @KanakaKumar @kunal642 optimized and CI pass, please review again.

---

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

In reply to this post by qiuchenjian-2

Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2816

retest this please

---

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1151/

---

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9415/

---

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1362/

---

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

In reply to this post by qiuchenjian-2

Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2816

retest this please

---

123456789