Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Saving a specific TensorFlow Checkpoint in time

Is it possible to mark checkpoints not to be deleted?

A little context:

I am creating a reinforcement learning model and I want to save my best model throughout the training. In order to do that, I am keeping the best score and whenever it is updated saving a checkpoint at that moment in time.

Unfortunately, my best_score checkpoints are getting deleted. I understand that the reason is that TF only keeps the newest 5 checkpoints, and this is fine.

I want just want to keep the most 5 recent checkpoints AND the best checkpoint which might not be in the most recent five. Is there a way to do it without storing all the checkpoints?

Thank you all!

like image 274
Mr.Mundum Avatar asked Dec 13 '25 23:12

Mr.Mundum


1 Answers

Looking at the issues posted here and here, this appears to be a requested feature which is not yet implemented. You can prevent all checkpoints from being deleted by using saver = tf.train.Saver(max_to_keep=0). If you're doing something big, then to keep this from filling up your disk I'd recommend not starting to save checkpoints until a reasonable number of steps have passed, and not saving unless the current result beats the last saved result by some minimum amount.

like image 96
Stephen Avatar answered Dec 16 '25 20:12

Stephen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!