[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...

classic Classic list List threaded Threaded
177 messages Options
1234 ... 9
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...

qiuchenjian-2
GitHub user xubo245 opened a pull request:

    https://github.com/apache/carbondata/pull/2792

      [CARBONDATA-2981] Support read primitive data type in CSDK

      [CARBONDATA-2981] Support read primitive data type in CSDK
       
           1.support readNextCarbonRow
           2.support read different primitive data type in c code from java side: int double short long string
           3.support some data type and convert: date timestamp varchar decimal array<T>
                 3.1 return int when read date
                3.2  return long when read timestamp
                3.3 return string when read varchar
                3.4 return string when  read decimal
                3.5 support array<string>
    This PR is based on PR2738, and will remove related commit after PR2738 merged.
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     add new interface
     - [ ] Any backward compatibility impacted?
     No
     - [ ] Document update required?
    Yes
     - [ ] Testing done
            update test case in c code
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    jira 2951


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xubo245/carbondata CARBONDATA-2981_primitiveDataType

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2792.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2792
   
----
commit 5f93bfc999dc7309671d59b1e73e4085d2684d58
Author: xubo245 <xubo29@...>
Date:   2018-09-20T10:35:34Z

    [CARBONDATA-2952] Provide CarbonReader C++ interface for SDK
   
    1.init carbonreader,config data path and tablename
    2.config ak sk endpoing for S3
    3.configure projection
    4.build carbon reader
    5.hasNext
    6.readNextRow
    7.close
   
    optimize

commit cd181b91c33d32e66a3f0026f1e3167a148b37e7
Author: xubo245 <xubo29@...>
Date:   2018-09-29T09:06:03Z

    [CARBONDATA-2981] Support read primitive data type in CSDK
   
       1.support readNextCarbonRow
       2.support read different primitive data type in c code from java side: int double short long string
       3.support some data type and convert: date timestamp varchar decimal array<T>
   
    su

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2792: [CARBONDATA-2981] Support read primitive data type i...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2792
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/657/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2792: [CARBONDATA-2981] Support read primitive data type i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2792
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8919/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2792: [CARBONDATA-2981] Support read primitive data type i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2792
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/851/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2792: [CARBONDATA-2981] Support read primitive data type i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2792
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/661/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2792: [CARBONDATA-2981] Support read primitive data type i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2792
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/856/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2792: [CARBONDATA-2981] Support read primitive data type i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2792
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8924/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2792: [CARBONDATA-2981] Support read primitive data type i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2792
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/662/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2792: [CARBONDATA-2981] Support read primitive data type i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2792
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/663/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2792: [CARBONDATA-2981] Support read primitive data type i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2792
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8926/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2792: [CARBONDATA-2981] Support read primitive data type i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2792
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/664/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2792: [CARBONDATA-2981] Support read primitive data type i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2792
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8927/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2792: [CARBONDATA-2981] Support read primitive data type i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2792
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/859/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2792: [CARBONDATA-2981] Support read primitive data type i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kunal642 commented on the issue:

    https://github.com/apache/carbondata/pull/2792
 
    @xubo245 Please add link for CSDK-guide in README file.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user KanakaKumar commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2792#discussion_r222686566
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/row/CarbonRow.java ---
    @@ -57,6 +74,154 @@ public String getString(int ordinal) {
         return (String) data[ordinal];
       }
     
    +  /**
    +   * get short type data by ordinal
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public short getShort(int ordinal) {
    +    return (short) data[ordinal];
    +  }
    +
    +  /**
    +   * get int data type data by ordinal
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public int getInt(int ordinal) {
    +    return (Integer) data[ordinal];
    +  }
    +
    +  /**
    +   * get long data type data by ordinal
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public long getLong(int ordinal) {
    +    return (long) data[ordinal];
    +  }
    +
    +  /**
    +   * get array data type data by ordinal
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public Object[] getArray(int ordinal) {
    +    return (Object[]) data[ordinal];
    +  }
    +
    +  /**
    +   * get double data type data by ordinal
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public double getDouble(int ordinal) {
    +    return (double) data[ordinal];
    +  }
    +
    +  /**
    +   * get boolean data type data by ordinal
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public boolean getBoolean(int ordinal) {
    +    return (boolean) data[ordinal];
    +  }
    +
    +  /**
    +   * get byte data type data by ordinal
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public Byte getByte(int ordinal) {
    +    return (Byte) data[ordinal];
    +  }
    +
    +  /**
    +   * get float data type data by ordinal
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public float getFloat(int ordinal) {
    +    return (float) data[ordinal];
    +  }
    +
    +  /**
    +   * get varchar data type data by ordinal
    +   * This is for CSDK
    +   * JNI don't support varchar, so carbon convert decimal to string
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public String getVarchar(int ordinal) {
    +    return (String) data[ordinal];
    +  }
    +
    +  /**
    +   * get decimal data type data by ordinal
    +   * This is for CSDK
    +   * JNI don't support Decimal, so carbon convert decimal to string
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public String getDecimal(int ordinal) {
    +    return ((BigDecimal) data[ordinal]).toString();
    +  }
    +
    +  /**
    +   * get data type by ordinal
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public DataType getDataType(int ordinal) {
    +    return dataTypes[ordinal];
    +  }
    +
    +  /**
    +   * get data type name by ordinal
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public String getDataTypeName(int ordinal) {
    +    return dataTypes[ordinal].getName();
    +  }
    +
    +  /**
    +   * get element type name by ordinal
    +   * child schema data type name
    +   * for example: return STRING if it's Array<String> in java
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return element type name
    +   */
    +  public String getElementTypeName(int ordinal) {
    --- End diff --
   
    If this method can work only for Array, we can rename it to getArrayElementTypeName and throw exception if its not array type. return null cause integration errors for unsupported ata types


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user KanakaKumar commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2792#discussion_r222689040
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonRecordReader.java ---
    @@ -116,6 +117,25 @@ public void initialize(InputSplit inputSplit, TaskAttemptContext context)
         return readSupport.readRow(carbonIterator.next());
       }
     
    +  /**
    +   * get CarbonRow data, including data and datatypes
    +   *
    +   * @return carbonRow object or data array or T
    +   * @throws IOException
    +   * @throws InterruptedException
    +   */
    +  public T getCarbonRow() throws IOException, InterruptedException {
    --- End diff --
   
    I think instead of confusing T, we can define the return type as CarbonRow itself


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user KanakaKumar commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2792#discussion_r222690123
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/readsupport/impl/DictionaryDecodeReadSupport.java ---
    @@ -81,7 +82,24 @@
             data[i] = dictionaries[i].getDictionaryValueForKey((int) data[i]);
           }
         }
    -    return (T)data;
    +    return (T) data;
    +  }
    +
    +  /**
    +   * get carbonRow, including data and datatpes
    +   *
    +   * @param data row data
    +   * @return CarbonRow Object
    +   */
    +  public T readCarbonRow(Object[] data) {
    --- End diff --
   
    Instead of changing the DictionaryDecodeReadSupport & other classes hierarchy, I suggest  to use a new Row class as utility and just provide required methods to avoid impact on base code.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user KanakaKumar commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2792#discussion_r222691023
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/row/CarbonRow.java ---
    @@ -18,8 +18,11 @@
     package org.apache.carbondata.core.datastore.row;
     
     import java.io.Serializable;
    +import java.math.BigDecimal;
    --- End diff --
   
    CarbonRow has different fields like data, rawData, rangeID etc.  It seems not intended for end user API.
    I think we can add a simple Row class for SDK scope.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user KanakaKumar commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2792#discussion_r222691292
 
    --- Diff: store/CSDK/CarbonReader.cpp ---
    @@ -89,10 +89,18 @@ jboolean CarbonReader::hasNext() {
         return hasNext;
     }
     
    +jobject CarbonReader::readNextCarbonRow() {
    +    jclass carbonReader = jniEnv->GetObjectClass(carbonReaderObject);
    +    jmethodID readNextCarbonRowID = jniEnv->GetMethodID(carbonReader, "readNextCarbonRow",
    +        "()Lorg/apache/carbondata/core/datastore/row/CarbonRow;");
    +    jobject carbonRow = (jobject) jniEnv->CallObjectMethod(carbonReaderObject, readNextCarbonRowID);
    +    return carbonRow;
    +}
    +
     jobjectArray CarbonReader::readNextRow() {
         jclass carbonReader = jniEnv->GetObjectClass(carbonReaderObject);
    -    jmethodID readNextRow2ID = jniEnv->GetMethodID(carbonReader, "readNextStringRow", "()[Ljava/lang/Object;");
    -    jobjectArray row = (jobjectArray) jniEnv->CallObjectMethod(carbonReaderObject, readNextRow2ID);
    +    jmethodID readNextStringRowID = jniEnv->GetMethodID(carbonReader, "readNextStringRow", "()[Ljava/lang/Object;");
    --- End diff --
   
    We can remove "readNextStringRow" and add a utility method in JNI to achieve the same.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2792#discussion_r223238343
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/row/CarbonRow.java ---
    @@ -57,6 +74,154 @@ public String getString(int ordinal) {
         return (String) data[ordinal];
       }
     
    +  /**
    +   * get short type data by ordinal
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public short getShort(int ordinal) {
    +    return (short) data[ordinal];
    +  }
    +
    +  /**
    +   * get int data type data by ordinal
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public int getInt(int ordinal) {
    +    return (Integer) data[ordinal];
    +  }
    +
    +  /**
    +   * get long data type data by ordinal
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public long getLong(int ordinal) {
    +    return (long) data[ordinal];
    +  }
    +
    +  /**
    +   * get array data type data by ordinal
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public Object[] getArray(int ordinal) {
    +    return (Object[]) data[ordinal];
    +  }
    +
    +  /**
    +   * get double data type data by ordinal
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public double getDouble(int ordinal) {
    +    return (double) data[ordinal];
    +  }
    +
    +  /**
    +   * get boolean data type data by ordinal
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public boolean getBoolean(int ordinal) {
    +    return (boolean) data[ordinal];
    +  }
    +
    +  /**
    +   * get byte data type data by ordinal
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public Byte getByte(int ordinal) {
    +    return (Byte) data[ordinal];
    +  }
    +
    +  /**
    +   * get float data type data by ordinal
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public float getFloat(int ordinal) {
    +    return (float) data[ordinal];
    +  }
    +
    +  /**
    +   * get varchar data type data by ordinal
    +   * This is for CSDK
    +   * JNI don't support varchar, so carbon convert decimal to string
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public String getVarchar(int ordinal) {
    +    return (String) data[ordinal];
    +  }
    +
    +  /**
    +   * get decimal data type data by ordinal
    +   * This is for CSDK
    +   * JNI don't support Decimal, so carbon convert decimal to string
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public String getDecimal(int ordinal) {
    +    return ((BigDecimal) data[ordinal]).toString();
    +  }
    +
    +  /**
    +   * get data type by ordinal
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public DataType getDataType(int ordinal) {
    +    return dataTypes[ordinal];
    +  }
    +
    +  /**
    +   * get data type name by ordinal
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return
    +   */
    +  public String getDataTypeName(int ordinal) {
    +    return dataTypes[ordinal].getName();
    +  }
    +
    +  /**
    +   * get element type name by ordinal
    +   * child schema data type name
    +   * for example: return STRING if it's Array<String> in java
    +   *
    +   * @param ordinal the data index of carbonRow
    +   * @return element type name
    +   */
    +  public String getElementTypeName(int ordinal) {
    --- End diff --
   
    ok, done


---
1234 ... 9