Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Apache Flink implement iteration?

DAG (directed acyclic graph) execution of big data is common. I am wondering how Apache Flink implements iterations given that it's possible that the graph can be cyclic.

like image 300
Pango Avatar asked Nov 24 '15 08:11

Pango


People also ask

How does Apache Flink work?

Flink is designed to run stateful streaming applications at any scale. Applications are parallelized into possibly thousands of tasks that are distributed and concurrently executed in a cluster. Therefore, an application can leverage virtually unlimited amounts of CPUs, main memory, disk and network IO.

Is Flink a real-time streaming system?

Flink capabilities enable real-time insights from streaming data and event-based capabilities. Flink enables real-time data analytics on streaming data and fits well for continuous Extract-transform-load (ETL) pipelines on streaming data and for event-driven applications as well.

What language does Flink use?

The core of Apache Flink is a distributed streaming data-flow engine written in Java and Scala.

What is Apache Flink good for?

Flink is a distributed processing engine and a scalable data analytics framework. You can use Flink to process data streams at a large scale and to deliver real-time analytical insights about your processed data with your streaming application.


1 Answers

If Flink executes iterative programs, the dataflow graph is not a DAG but allows for cycles. However, this cycles are not arbitrary and must follow a certain pattern to allow Flink to control this cyclic flow to some extent.

There is often no strict technical reason in other systems for not supporting cycles. Allowing for cycles in a generic way is usually prohibited because it might result in an infinite loop (ie, that a tuple spins the cycle forever and the program does not terminate).

Flink tracks the cycle by counting the number of iterations. This way, Flink can track which tuples belong to which iterations and can, for example, avoid tuples from a new iteration "taking over" tuples from an older one. Furthermore, it allows Flink to detect if the result of iteration n and n+1 are equal or not. An equal result indicates a finished computation allowing Flink to break the infinite loop and terminate (this holds for so-called fix-point iterations).

For a detailed read look at this research paper: https://dl.acm.org/citation.cfm?id=2350245

The usage of iteration in your program is described here: https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/programming_guide.html#iteration-operators

like image 97
Matthias J. Sax Avatar answered Sep 30 '22 12:09

Matthias J. Sax