Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data/event exchange between jobs

Tags:

apache-flink

Is it possible in Apache Flink, to create an application, which consists of multiple jobs who build a pipeline to process some data.

For example, consider a process with an input/preprocessing stage, a business logic and an output stage. In order to be flexible in development and (re)deployment, I would like to run these as independent jobs.

Is it possible in Flink to built this and directly pipe the output of one job to the input of another (without external components)? If yes, where can I find documentation about this and can it buffer data if one of the jobs is restarted? If no, does anyone have experience with such a setup and point me to a possible solution?

Thank you!

like image 737
Kan Avatar asked Dec 28 '25 16:12

Kan


1 Answers

If you really want separate jobs, then one way to connect them is via something like Kafka, where job A publishes, and job B (downstream) subscribes. Once you disconnect the two jobs, though, you no longer get the benefit of backpressure or unified checkpointing/saved state.

Kafka can do buffering of course (up to some max amount of data), but that's not a solution to a persistent different in performance, if the upstream job is generating data faster than the downstream job can consume it.

I imagine you could also use files as the 'bridge' between jobs (streaming file sink and then streaming file source), though that would typically create significant latency as the downstream job has to wait for the upstream job to decide to complete a file, before it can be consumed.

like image 195
kkrugler Avatar answered Dec 31 '25 18:12

kkrugler



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!