In my application, I want to enrich an infinite stream of events. The stream itself is parallelised by hashing of an Id. For every event, there might be a call to an external source (e.g. REST, DB). This call is blocking by nature. The order of events within one stream partition must be maintained.
My idea was to create a RichMapFunction, which sets up the connection and then polls the external source for each event. The blocking call usually takes not to long, but in the worst case, the service could be down.
Theoretically, this works, but I don't feel good doing it this way, as I don't know how Flink reacts if you have some blocking operations within the stream. And what happens if you have a lot of parallel streams blocking, i.e. am I running out of threads? Or how is the behavior stream-upwards at the point where the stream is parallelised?
Does someone else may have a similar issue and an answer to my question or some ideas how to tackle it?
Netflix engineers recently published how they built Studio Search, using Apache Kafka streams, an Apache Flink-based Data Mesh process, and an Elasticsearch sink to manage the index.
The main reason for this is its stream processing feature, which manages to process rows upon rows of data in real time – which is not possible in Apache Spark's batch processing method. This makes Flink faster than Spark.
Flink enables real-time data analytics on streaming data and fits well for continuous Extract-transform-load (ETL) pipelines on streaming data and for event-driven applications as well.
According to the Apache Flink documentation, KeyBy transformation logically partitions a stream into disjoint partitions. All records with the same key are assigned to the same partition.
RichMapFunction
is a good starting point but prefer RichAsyncFunction
which is asynchronous and which not block your processing !
Careful :
1- your DB access but also be asynchronous
2- your event order may change (according to the mode used)
More details : https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/asyncio.html
Hope it helps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With