I'm training a model where the input vector is the output of another model. This involves restoring the first model from a checkpoint file while initializing the second model from scratch (using tf.initialize_variables()
) in the same process.
There is a substantial amount of code and abstraction, so I'm just pasting the relevant sections here.
The following is the restoring code:
self.variables = [var for var in all_vars if var.name.startswith(self.name)]
saver = tf.train.Saver(self.variables, max_to_keep=3)
self.save_path = tf.train.latest_checkpoint(os.path.dirname(self.checkpoint_path))
if should_restore:
self.saver.restore(self.sess, save_path)
else:
self.sess.run(tf.initialize_variables(self.variables))
Each model is scoped within its own graph and session, like this:
self.graph = tf.Graph()
self.sess = tf.Session(graph=self.graph)
with self.sess.graph.as_default():
# Create variables and ops.
All the variables within each model are created within the variable_scope
context manager.
The feeding works as follows:
sess.run(inference_op)
on input = scipy.misc.imread(X)
and puts the result in a blocking thread-safe queue. sess.run(train_op)
on the second model.PROBLEM:
I am observing that the loss values, even in the very first iteration of the training (second model) keep changing drastically across runs (and become nan in a few iterations). I confirmed that the output of the first model is exactly the same everytime. Commenting out the sess.run
of the first model and replacing it with identical input from a pickled file does not show this behaviour.
This is the train_op
:
loss_op = tf.nn.sparse_softmax_cross_entropy(network.feedforward())
# Apply gradients.
with tf.control_dependencies([loss_op]):
opt = tf.train.GradientDescentOptimizer(lr)
grads = opt.compute_gradients(loss_op)
apply_gradient_op = opt.apply_gradients(grads)
return apply_gradient_op
I know this is vague, but I'm happy to provide more details. Any help is appreciated!
Session in TensorFlow. It's simple: A graph defines the computation. It doesn't compute anything, it doesn't hold any values, it just defines the operations that you specified in your code. A session allows to execute graphs or part of graphs.
TensorFlow uses graphs as the format for saved models when it exports them from Python. Graphs are also easily optimized, allowing the compiler to do transformations like: Statically infer the value of tensors by folding constant nodes in your computation ("constant folding").
Why tensorflow uses computational graphs? Exp: Tensorflow uses computational graphs because calculations can be done in parallel.
TensorFlow Session is a session object which encapsulates the environment in which Operation objects are executed, and data objects are evaluated. TensorFlow requires a session to execute an operation and retrieve its calculated value. A session may own several resources, for example, tf. QueueBase, tf.
The issue is most certainly happening due to concurrent execution of different session objects. I moved the first model's session from the background thread to the main thread, repeated the controlled experiment several times (running for over 24 hours and reaching convergence) and never observed NaN
. On the other hand, concurrent execution diverges the model within a few minutes.
I've restructured my code to use a common session object for all models.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With