df = spark.read.format('csv').load('...')
It is my understanding that , load is a transformation and executes only when an action is called. However, while the load statement is being executed, it appears to be an action under the Spark UI.
Edit:
From the comments/answers , i inferred that load may or may not be a transformation but not definitely an action which is great and understandable.
If it is not an action why it is creating a DAG? It creates a DAG just for a load statement not just WholeStageCodegen(which is in SQL tab). Please see the below image: Screenshot
Specifically, based on your comments:
Load does nothing. It is just part of the sqlContext.read or spark.read.format API as a parameter, that can be set indirectly or directly on the read. read allows data formats to be specified.
The DF or underlying RDD is evaluated lazily as they say.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With