[GitHub] carbondata pull request #2816: [CARBONDATA-300] Suppor read batch row in CSD...

classic Classic list List threaded Threaded
168 messages Options
123456789
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2816#discussion_r229161289
 
    --- Diff: store/CSDK/test/main.cpp ---
    @@ -220,6 +393,86 @@ bool tryCatchException(JNIEnv *env) {
      */
     bool readFromS3(JNIEnv *env, char *argv[]) {
         printf("\nRead data from S3:\n");
    +    struct timeval start, build, read;
    --- End diff --
   
    ok, optimized, please check.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2816#discussion_r229161334
 
    --- Diff: store/CSDK/test/main.cpp ---
    @@ -220,6 +393,86 @@ bool tryCatchException(JNIEnv *env) {
      */
     bool readFromS3(JNIEnv *env, char *argv[]) {
         printf("\nRead data from S3:\n");
    +    struct timeval start, build, read;
    +    gettimeofday(&start, NULL);
    +
    +    CarbonReader reader;
    +
    +    char *args[3];
    +    // "your access key"
    +    args[0] = argv[1];
    +    // "your secret key"
    +    args[1] = argv[2];
    +    // "your endPoint"
    +    args[2] = argv[3];
    +
    +    reader.builder(env, "s3a://sdk/WriterOutput/carbondata", "test");
    +    reader.withHadoopConf("fs.s3a.access.key", argv[1]);
    +    reader.withHadoopConf("fs.s3a.secret.key", argv[2]);
    +    reader.withHadoopConf("fs.s3a.endpoint", argv[3]);
    +    reader.build();
    +
    +    gettimeofday(&build, NULL);
    +    int time = 1000000 * (build.tv_sec - start.tv_sec) + build.tv_usec - start.tv_usec;
    +    int buildTime = time / 1000000.0;
    +    printf("build time: %lf s\n", time / 1000000.0);
    +
    +    CarbonRow carbonRow(env);
    +    int i = 0;
    +    while (reader.hasNext()) {
    +        jobject row = reader.readNextRow();
    +        i++;
    +        carbonRow.setCarbonRow(row);
    +
    +        printf("%s\t", carbonRow.getString(0));
    +        printf("%d\t", carbonRow.getInt(1));
    +        printf("%ld\t", carbonRow.getLong(2));
    +        printf("%s\t", carbonRow.getVarchar(3));
    +        jobjectArray arr = carbonRow.getArray(4);
    +        jsize length = env->GetArrayLength(arr);
    +        int j = 0;
    +        for (j = 0; j < length; j++) {
    +            jobject element = env->GetObjectArrayElement(arr, j);
    +            char *str = (char *) env->GetStringUTFChars((jstring) element, JNI_FALSE);
    +            printf("%s\t", str);
    +        }
    +        env->DeleteLocalRef(arr);
    +        printf("%d\t", carbonRow.getShort(5));
    +        printf("%d\t", carbonRow.getInt(6));
    +        printf("%ld\t", carbonRow.getLong(7));
    +        printf("%lf\t", carbonRow.getDouble(8));
    +        bool bool1 = carbonRow.getBoolean(9);
    +        if (bool1) {
    +            printf("true\t");
    +        } else {
    +            printf("false\t");
    +        }
    +        printf("%s\t", carbonRow.getDecimal(10));
    +        printf("%f\t", carbonRow.getFloat(11));
    +        printf("\n");
    +        env->DeleteLocalRef(row);
    +    }
    +    gettimeofday(&read, NULL);
    +    time = 1000000 * (read.tv_sec - start.tv_sec) + read.tv_usec - start.tv_usec;
    +    printf("total lines is %d: build time: %lf, read time is %lf s, average speed is %lf records/s\n",
    +           i, buildTime, time / 1000000.0, i / (time / 1000000.0));
    +
    +    reader.close();
    +}
    +
    +/**
    + * read data from S3
    + * parameter is ak sk endpoint
    + *
    + * @param env jni env
    + * @param argv argument vector
    + * @return
    + */
    +bool readFromS3ForBigData(JNIEnv *env, char **argv) {
    --- End diff --
   
    removed this test case


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2816#discussion_r229161378
 
    --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ---
    @@ -90,6 +93,20 @@ public T readNextRow() throws IOException, InterruptedException {
         return currentReader.getCurrentValue();
       }
     
    +  /**
    +   * Read and return next batch row objects
    +   */
    +  public Object[] readNextBatchRow() throws Exception {
    +    validateReader();
    +    int batch = Integer.parseInt(CarbonProperties.getInstance()
    --- End diff --
   
    ok, I added default batch size


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2816#discussion_r229161840
 
    --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ---
    @@ -90,6 +93,20 @@ public T readNextRow() throws IOException, InterruptedException {
         return currentReader.getCurrentValue();
       }
     
    +  /**
    +   * Read and return next batch row objects
    +   */
    +  public Object[] readNextBatchRow() throws Exception {
    +    validateReader();
    +    int batch = Integer.parseInt(CarbonProperties.getInstance()
    --- End diff --
   
    no need batch in here, I removed.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2816#discussion_r229162173
 
    --- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ---
    @@ -1737,4 +1739,95 @@ public void testReadNextRowWithProjectionAndRowUtil() {
         }
       }
     
    +  @Test
    +  public void testReadNextBatchRow() {
    +    String path = "./carbondata";
    +    try {
    +      FileUtils.deleteDirectory(new File(path));
    +
    +      Field[] fields = new Field[12];
    +      fields[0] = new Field("stringField", DataTypes.STRING);
    +      fields[1] = new Field("shortField", DataTypes.SHORT);
    +      fields[2] = new Field("intField", DataTypes.INT);
    +      fields[3] = new Field("longField", DataTypes.LONG);
    +      fields[4] = new Field("doubleField", DataTypes.DOUBLE);
    +      fields[5] = new Field("boolField", DataTypes.BOOLEAN);
    +      fields[6] = new Field("dateField", DataTypes.DATE);
    +      fields[7] = new Field("timeField", DataTypes.TIMESTAMP);
    +      fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 2));
    +      fields[9] = new Field("varcharField", DataTypes.VARCHAR);
    +      fields[10] = new Field("arrayField", DataTypes.createArrayType(DataTypes.STRING));
    +      fields[11] = new Field("floatField", DataTypes.FLOAT);
    +      Map<String, String> map = new HashMap<>();
    +      map.put("complex_delimiter_level_1", "#");
    +      CarbonWriter writer = CarbonWriter.builder()
    +          .outputPath(path)
    +          .withLoadOptions(map)
    +          .withCsvInput(new Schema(fields))
    +          .writtenBy("CarbonReaderTest")
    +          .build();
    +
    +      for (int i = 0; i < 10; i++) {
    +        String[] row2 = new String[]{
    +            "robot" + (i % 10),
    +            String.valueOf(i % 10000),
    +            String.valueOf(i),
    +            String.valueOf(Long.MAX_VALUE - i),
    +            String.valueOf((double) i / 2),
    +            String.valueOf(true),
    +            "2019-03-02",
    +            "2019-02-12 03:03:34",
    +            "12.345",
    +            "varchar",
    +            "Hello#World#From#Carbon",
    +            "1.23"
    +        };
    +        writer.write(row2);
    +      }
    +      writer.close();
    +
    +      // Read data
    +      CarbonReader reader = CarbonReader
    +          .builder(path, "_temp")
    +          .withBatch(3)
    +          .build();
    +
    +      int i = 0;
    +      while (reader.hasNext()) {
    +        Object[] batch = reader.readNextBatchRow();
    +
    +        for (int j = 0; j < batch.length; j++) {
    --- End diff --
   
    ok,done


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9394/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1342/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1130/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1131/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1343/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9395/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1132/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1344/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9396/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xubo245 commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    @ajantha-bhat @KanakaKumar @kunal642 optimized and CI pass, please review again.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xubo245 commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1151/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9415/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1362/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xubo245 commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    retest this please


---
123456789