I'm trying to design a streaming architecture for streaming analytics. Requirements:
I'm exploring Kafka and Kafka Streams for stream processing and RT/NRT realtime messaging. My question is: I need to perform some query to external systems (info providers, MongoDB etc etc) during stream pocessing. These queries could be both sync and async req-response, based on the external system characteristics.
I've read this post explaining how to join KStream and KTable during processing and it's very interesting but in this scenario KTable is not dependant on input parameters coming from the KStream, it's just a streaming representation of a table.
I need to query the external system foreach KStream message, passing some message fields as query parameters and enrich the streaming message with query result, then publish the enriched message to an output topic. Is there any consolidated paradigm to design this stream processing? Is there any specific technology I'd better to use? Remember that queries can be sync and async.
I'd also like to design wrappers to these external systems, implementing a sort of distributed RPC, callable from a Kafka Stream processing. Could you suggest any technology/framework? I was considering Akka actors for distributing query responders but I can't understand if Akka fits well with the request-response paradigm.
Thanks
About the querying pattern to external systems, there are multiple possibilities you have:
KTable
s to do the KStream-KTable
lookup join.KStream
methods #mapValues()
, #map()
, or lower level methods like #transform()
or #process()
. Thus, you manually open a connection to your external system and issue a lookup query for each record you process.
#mapValues()
for example to implement this)Compare this question about failure handling in streams with regard to offset commits: How to handle error and don't commit when use Kafka Streams DSL
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With