[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3193: [CARBONDATA-3365] Integrate apache arrow vector filling to carbon SDK

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3193: [CARBONDATA-3365] Integrate apache arrow vector filling to carbon SDK

GitBox
ajantha-bhat commented on a change in pull request #3193: [CARBONDATA-3365] Integrate apache arrow vector filling to carbon SDK
URL: https://github.com/apache/carbondata/pull/3193#discussion_r281575711
 
 

 ##########
 File path: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java
 ##########
 @@ -94,6 +96,43 @@ public T readNextRow() throws IOException, InterruptedException {
     return currentReader.getCurrentValue();
   }
 
+  /**
+   * Carbon reader will fill the arrow vector after reading the carbondata files.
+   * This arrow byte[] can be used to create arrow table and used for in memory analytics
+   *
+   * Note: create a reader at blocklet level, so that arrow byte[] will not exceed INT_MAX
+   *
+   * @param carbonSchema
+   * @return
+   * @throws Exception
+   */
+  public byte[] readArrowBatch(Schema carbonSchema) throws Exception {
+    ArrowConverter arrowConverter = new ArrowConverter(carbonSchema, 10000);
+    while (hasNext()) {
+      arrowConverter.addToArrowBuffer(readNextBatchRow());
+    }
+    return arrowConverter.toSerializeArray();
+  }
+
+  /**
+   * Carbon reader will fill the arrow vector after reading carbondata files.
+   * Here unsafe memory address will be returned instead of byte[],
+   * so that this address can be sent across java to python or c modules and
+   * can directly read the content from this unsafe memory
+   *
+   * Note: create a reader at blocklet level, so that arrow byte[] will not exceed INT_MAX
+   *
+   * @param carbonSchema
+   * @return
+   * @throws Exception
+   */
+  public long readArrowBatchAddress(Schema carbonSchema) throws Exception {
 
 Review comment:
   done. added in same existing testcase

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services