Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between destroy() and unpersist()?

Spark is shipped with Broadcast variables, which allows us to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks.

Of course, when the "broadcasted variable" isn't going to be used anymore, it is natural to delete this variable. But from the documentation, it seems that there's two way of deleting broadcasted variables which are:

unpersist() //Destroy all data and metadata related to this broadcast variable.
destroy() //Asynchronously delete cached copies of this broadcast on the executors.

I am not sure to properly understand everything, does unpersist() does the same as delete() but synchronously? This is unclear for me.

like image 548
Will Avatar asked Nov 25 '15 15:11

Will


1 Answers

As it is implemented in the two currently available concrete implementations (HttpBroadcast and TorrentBroadacst) there are two differences:

  • destroy is blocking (awaits for confirmation) while unpersist is by default non-blocking
  • destroy removes persisted blocks from the driver while unpersist doesn't

Otherwise these use the same logic of the BlockMangerMaster.

like image 132
zero323 Avatar answered Sep 19 '22 16:09

zero323