Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Akka streams vs Apache Flink

While exploring Akka streams, I also came across Apache Flink which stream processing engine. Akka streams implements reactive streams and supports back pressure.

So if I have to make decision between two, which one should I go for? How do they differ and whats the similarity? What should be the criteria here?

like image 643
Mandroid Avatar asked Dec 11 '22 00:12

Mandroid


1 Answers

Akka Streams is a library implementing reactive streams specification.

Apache Flink is a streaming engine.

The main high level difference is that in Apache Flink you create a job by coding against one of Flink APIs and you submit that job to Apache Flink cluster. It is the Apache Flink cluster that executes your stream processing job. By using Akka Streams you are creating a standalone application. In that sense Akka Streams is a more lightweight of the two.

You can still distribute Akka Streams based app by using StreamRefs, though you need to do that explicitly in the code and you need to run Akka Cluster. Apache Flink already manages a cluster so you don't need to do that explicitly in your code (though you still need the cluster set up and running to submit your jobs to). Apache Flink has smarts built in to take a job and execute it in an optimal way. Parallelizing/distributing execution when possible. You don't get that with Akka Streams.

Apache Flink stream processing is designed to achieve end2end exactly once processing semantics in face of failures. In Akka Streams such guarantee would need to be implemented explicitly in your code.

Akka Streams as reactive streams specification implementation is all about asynchronous and memory bound processing. Akka HTTP for example is built on top of Akka Streams and as a result implements a very efficient and lightweight client and server sides of HTTP protocol.

Akka Streams implements asynchronous non-blocking backpressure (as per reactive streams specification) to guarantee the memory boundedness during execution. Apache Flink also has a backpressure mechanism, though it's not implemented in the same way.

Akka Streams as an implementation of reactive streams specification can interoperate with other implementations like RxJava or Project Reactor. Apache Flink is not part of any broader standard.

I would say the main reasons to go for Apache Flink is the exactly once guarantees and automated distribution that comes with it. Otherwise Akka Streams is a very powerful API with simpler execution model.

EDIT: Probably worth mentioning project Alpakka that brings a lot of technologies to Akka Streams so that they can be plugged in to reactive streams based processing.

like image 74
artur Avatar answered Jan 04 '23 21:01

artur