Spark is shipped with Broadcast variables, which allows us to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks.
Of course, when the "broadcasted variable" isn't going to be used anymore, it is natural to delete this variable. But from the documentation, it seems that there's two way of deleting broadcasted variables which are:
unpersist() //Destroy all data and metadata related to this broadcast variable.
destroy() //Asynchronously delete cached copies of this broadcast on the executors.
I am not sure to properly understand everything, does unpersist() does the same as delete() but synchronously? This is unclear for me.
As it is implemented in the two currently available concrete implementations (HttpBroadcast
and TorrentBroadacst
) there are two differences:
destroy
is blocking (awaits for confirmation) while unpersist
is by default non-blockingdestroy
removes persisted blocks from the driver
while unpersist
doesn't Otherwise these use the same logic of the BlockMangerMaster
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With