Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is 'load' command in spark an action or transformation?

df = spark.read.format('csv').load('...')

It is my understanding that , load is a transformation and executes only when an action is called. However, while the load statement is being executed, it appears to be an action under the Spark UI.

Edit:

From the comments/answers , i inferred that load may or may not be a transformation but not definitely an action which is great and understandable.

If it is not an action why it is creating a DAG? It creates a DAG just for a load statement not just WholeStageCodegen(which is in SQL tab). Please see the below image: Screenshot

like image 564
j raj Avatar asked Nov 25 '25 06:11

j raj


1 Answers

Specifically, based on your comments:

Load does nothing. It is just part of the sqlContext.read or spark.read.format API as a parameter, that can be set indirectly or directly on the read. read allows data formats to be specified.

The DF or underlying RDD is evaluated lazily as they say.

like image 66
thebluephantom Avatar answered Nov 28 '25 02:11

thebluephantom



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!