ajantha-bhat commented on a change in pull request #3193: [CARBONDATA-3365] Integrate apache arrow vector filling to carbon SDK
URL:
https://github.com/apache/carbondata/pull/3193#discussion_r281581869
##########
File path: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java
##########
@@ -94,6 +96,43 @@ public T readNextRow() throws IOException, InterruptedException {
return currentReader.getCurrentValue();
}
+ /**
+ * Carbon reader will fill the arrow vector after reading the carbondata files.
+ * This arrow byte[] can be used to create arrow table and used for in memory analytics
+ *
+ * Note: create a reader at blocklet level, so that arrow byte[] will not exceed INT_MAX
+ *
+ * @param carbonSchema
+ * @return
+ * @throws Exception
+ */
+ public byte[] readArrowBatch(Schema carbonSchema) throws Exception {
+ ArrowConverter arrowConverter = new ArrowConverter(carbonSchema, 10000);
+ while (hasNext()) {
+ arrowConverter.addToArrowBuffer(readNextBatchRow());
+ }
+ return arrowConverter.toSerializeArray();
+ }
+
+ /**
+ * Carbon reader will fill the arrow vector after reading carbondata files.
+ * Here unsafe memory address will be returned instead of byte[],
+ * so that this address can be sent across java to python or c modules and
+ * can directly read the content from this unsafe memory
+ *
+ * Note: create a reader at blocklet level, so that arrow byte[] will not exceed INT_MAX
+ *
+ * @param carbonSchema
+ * @return
+ * @throws Exception
+ */
+ public long readArrowBatchAddress(Schema carbonSchema) throws Exception {
+ ArrowConverter arrowConverter = new ArrowConverter(carbonSchema, 10000);
Review comment:
It's an initial size, no need to make it configurable. I checked arrow code. They use 0 as initial value. Let me use the same
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[hidden email]
With regards,
Apache Git Services