Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Monitored training session save all checkpoints

Tags:

tensorflow

While using the tf.train.MonitoredTrainingSession, is it possible to save all the checkpoints. It has a parameter (save_checkpoint_secs=600) to specify after how much we want to save a checkpoint but there is no option to specify how many checkpoints you can save.

While using the simple tf.train.Saver(), there is an option to specify max_to_keep.

like image 478
Himanshu Sanghi Avatar asked Jan 28 '23 12:01

Himanshu Sanghi


1 Answers

You can pass a tf.train.Saver using a tf.train.Scaffold to a tf.train.MonitoredTrainingSession:

import tensorflow as tf
scaffold = tf.train.Scaffold(saver=tf.train.Saver(max_to_keep=10))
with tf.train.MonitoredTrainingSession(scaffold=scaffold) as sess:
    ...
like image 72
pfm Avatar answered Jan 31 '23 18:01

pfm