http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Should-CarbonData-need-to-integrate-with-Spark-Streaming-too-tp35341p35415.html
Thanks for you started this discussion for adding spark streaming support.
1. Please try to utilize the current code(structured streaming), not adding
separated logic code for spark streaming.
2. I suggest that by default is using structured streaming , please consider
how to make configuration for enabling/switching to spark streaming.
> Hi dev:
> Currently CarbonData 1.3(will be released soon) just support to
> integrate
> with Spark Structured Streaming which requires Kafka's version must be >=
> 0.10. I think there are still many users integrating Spark Streaming with
> kafka 0.8, at least our cluster is, but the cost of upgrading kafka is too
> much. So should CarbonData need to integrate with Spark Streaming too?
>
> I think there are two ways to integrate with Spark Streaming, as
> following:
> 1). CarbonData batch data loading + Auto compaction
> Use CarbonSession.createDataFrame to convert rdd to DataFrame in
> InputDStream.foreachRDD, and then save rdd data into CarbonData table
> which
> support auto compaction. In this way, it can support to create
> pre-aggregate
> tables on this main table too (Streaming table does not support to create
> pre-aggregate tables on it).
>
> I can test with this way in our QA env and add example to CarbonData.
>
> 2). The same as integration with Structured Streaming
> With this way, Structured Streaming append every mini-batch data into
> stream segment which is row format, and then when the size of stream
> segment
> is greater than 'carbon.streaming.segment.max.size', it will auto convert
> stream segment to batch segment(column format) at the begin of each batch
> and create a new stream segment to append data.
> However, I have no idea how to integrate with Spark Streaming yet, *any
> suggestion for this*?
>
>
>
> --
> Sent from:
>
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/