What is the difference between destroy() and unpersist()?

Question

Spark is shipped with Broadcast variables, which allows us to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks.

Of course, when the "broadcasted variable" isn't going to be used anymore, it is natural to delete this variable. But from the documentation, it seems that there's two way of deleting broadcasted variables which are:

unpersist() //Destroy all data and metadata related to this broadcast variable.
destroy() //Asynchronously delete cached copies of this broadcast on the executors.

I am not sure to properly understand everything, does unpersist() does the same as delete() but synchronously? This is unclear for me.

zero323 · Accepted Answer

As it is implemented in the two currently available concrete implementations (HttpBroadcast and TorrentBroadacst) there are two differences:

destroy is blocking (awaits for confirmation) while unpersist is by default non-blocking
destroy removes persisted blocks from the driver while unpersist doesn't

Otherwise these use the same logic of the BlockMangerMaster.

What is the difference between destroy() and unpersist()?

Tags:

scala

apache-spark

Will

1 Answers

zero323

Recent Activity

Donate For Us

What is the difference between destroy() and unpersist()?

Tags:

scala

apache-spark

Will

1 Answers

zero323

Related questions

Recent Activity

Donate For Us