DAG (directed acyclic graph) execution of big data is common. I am wondering how Apache Flink implements iterations given that it's possible that the graph can be cyclic.
Flink is designed to run stateful streaming applications at any scale. Applications are parallelized into possibly thousands of tasks that are distributed and concurrently executed in a cluster. Therefore, an application can leverage virtually unlimited amounts of CPUs, main memory, disk and network IO.
Flink capabilities enable real-time insights from streaming data and event-based capabilities. Flink enables real-time data analytics on streaming data and fits well for continuous Extract-transform-load (ETL) pipelines on streaming data and for event-driven applications as well.
The core of Apache Flink is a distributed streaming data-flow engine written in Java and Scala.
Flink is a distributed processing engine and a scalable data analytics framework. You can use Flink to process data streams at a large scale and to deliver real-time analytical insights about your processed data with your streaming application.
If Flink executes iterative programs, the dataflow graph is not a DAG but allows for cycles. However, this cycles are not arbitrary and must follow a certain pattern to allow Flink to control this cyclic flow to some extent.
There is often no strict technical reason in other systems for not supporting cycles. Allowing for cycles in a generic way is usually prohibited because it might result in an infinite loop (ie, that a tuple spins the cycle forever and the program does not terminate).
Flink tracks the cycle by counting the number of iterations. This way, Flink can track which tuples belong to which iterations and can, for example, avoid tuples from a new iteration "taking over" tuples from an older one. Furthermore, it allows Flink to detect if the result of iteration n
and n+1
are equal or not. An equal result indicates a finished computation allowing Flink to break the infinite loop and terminate (this holds for so-called fix-point iterations).
For a detailed read look at this research paper: https://dl.acm.org/citation.cfm?id=2350245
The usage of iteration in your program is described here: https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/programming_guide.html#iteration-operators
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With