When we talk about RDD graphs, does it mean lineage graph or DAG (direct acyclic graph) or both? and when is the lineage graph generated? is it generated before the DAG of Spark tasks?
When one derives the new RDD from existing (previous) RDD using transformation, Spark keeps the track of all the dependencies between RDD is called lineage graph. (1) When there is a demand for computing the new RDD.
What is DAG in Spark? DAG means that Directed Acyclic Graph(No directed cycles). It is a set of Edges and Vertices, where vertices act the RDDs. Edges act as the Operation to be applied on RDD. It is a collection of all transformation and actions. Lineage Graph vs DAG: Lineage Graph is dealing with only RDDs so it is applicable to transformations
How the RDD lineage graph happens in programmatically: What is DAG in Spark? DAG means that Directed Acyclic Graph (No directed cycles). It is a set of Edges and Vertices, where vertices act the RDDs. Edges act as the Operation to be applied on RDD. It is a collection of all transformation and actions.
RDD Lineage is just a portion of a DAG (one or more operations) that lead to the creation of that particular RDD. So, one DAG (one Spark program) might create multiple RDDs, and each RDD will have its lineage (i.e that path in your DAG that lead to that RDD).
An RDD can depend on zero or more other RDDs. For example when you say x = y.map(...)
, x
will depend on y
. These dependency relationships can be thought of as a graph.
You can call this graph a lineage graph, as it represents the derivation of each RDD. It is also necessarily a DAG, since a loop is impossible to be present in it.
Narrow dependencies, where a shuffle is not required (think map
and filter
) can be collapsed into a single stage. Stages are a unit of execution, and they are generated by the DAGScheduler
from the graph of RDD dependencies. Stages also depend on each other. The DAGScheduler
builds and uses this dependency graph (which is also necessarily a DAG) to schedule the stages.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With