I have some Scala code that I can run with Spark using spark-submit. From what I understood, Spark creates a DAG in order to schedule the operation.
Is there a way to retrieve this DAG without actually performing the heavy operations, e.g. just by analyzing the code ?
I would like a useful representation such as a data structure or at least a written representation, not the DAG visualization.
If you are using dataframes (spark sql) you can use df.explain(true) to get the plan and all operations (before and after optimization).
If you are using rdd you can use rdd.toDebugString to get a string representation and rdd.dependencies to get the tree itself.
If you use these without the actual action you would get a representation of what is going to happen without actually doing the heavy lifting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With