Is parallelize (and other load operations) executed only at the time a Spark action is executed or immediately when it is encountered?
See def parallelize in spark code
Note the different consequences for instance for .textFile(...): Lazy evaluation would mean that while possibly saving some memory initially, the text file has to be read every time an action is performed and that a change in the text file would affect all actions after the change.
parallelize
is executed lazily: see L726 of your cited code stating "@note Parallelize acts lazily."
Execution in Spark is only triggered once you call an action e.g. collect
or count
.
Thus in total with Spark:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With