Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does "cyclic data flow" mean in Apache Spark?

Tags:

apache-spark

Spark is a DAG execution engine. Are not cyclic and DAG opposite concepts? It's surprising hard to find the answer to this apparent contradiction.

As you can see here: Understanding your Apache Spark Application Through Visualization, it is possible to visualize the execution DAG using the Spark UI. However, none of the examples in that page shows a cyclic data flow. In the following image you can see one of these examples.

Spark execution DAG example

Can these iterations (cyclic data flows) be outside the graph? I have read in MAPR that "Each Spark job creates a DAG of task stages to be performed on the cluster". Then, maybe the cyclic data flow occurs between DAGs (jobs).

Thank you.


1 Answers

Ok, it seems that it was a typo or something in the documentation. As of today, we can find this in the Spark homepage:

Apache Spark has an advanced DAG execution engine that supports acyclic data flow and in-memory computing.