Multiple sessions and graphs in Tensorflow (in the same process)

Tags:

I'm training a model where the input vector is the output of another model. This involves restoring the first model from a checkpoint file while initializing the second model from scratch (using tf.initialize_variables()) in the same process.

There is a substantial amount of code and abstraction, so I'm just pasting the relevant sections here.

The following is the restoring code:

self.variables = [var for var in all_vars if var.name.startswith(self.name)]
saver = tf.train.Saver(self.variables, max_to_keep=3)
self.save_path = tf.train.latest_checkpoint(os.path.dirname(self.checkpoint_path))

if should_restore:
    self.saver.restore(self.sess, save_path)
else:
    self.sess.run(tf.initialize_variables(self.variables))

Each model is scoped within its own graph and session, like this:

 self.graph = tf.Graph()
 self.sess = tf.Session(graph=self.graph)

 with self.sess.graph.as_default():
    # Create variables and ops.

All the variables within each model are created within the variable_scope context manager.

The feeding works as follows:

A background thread calls sess.run(inference_op) on input = scipy.misc.imread(X) and puts the result in a blocking thread-safe queue.
The main training loop reads from the queue and calls sess.run(train_op) on the second model.

PROBLEM:
I am observing that the loss values, even in the very first iteration of the training (second model) keep changing drastically across runs (and become nan in a few iterations). I confirmed that the output of the first model is exactly the same everytime. Commenting out the sess.run of the first model and replacing it with identical input from a pickled file does not show this behaviour.

This is the train_op:

    loss_op = tf.nn.sparse_softmax_cross_entropy(network.feedforward())
    # Apply gradients.
    with tf.control_dependencies([loss_op]):
        opt = tf.train.GradientDescentOptimizer(lr)
        grads = opt.compute_gradients(loss_op)
        apply_gradient_op = opt.apply_gradients(grads)

    return apply_gradient_op

I know this is vague, but I'm happy to provide more details. Any help is appreciated!

875

asked Aug 07 '16 23:08

Vikesh

1 Answers

The issue is most certainly happening due to concurrent execution of different session objects. I moved the first model's session from the background thread to the main thread, repeated the controlled experiment several times (running for over 24 hours and reaching convergence) and never observed NaN. On the other hand, concurrent execution diverges the model within a few minutes.

I've restructured my code to use a common session object for all models.

133

answered Oct 07 '22 01:10

Vikesh

Related questions
                            
                                How to output Pandas object from sklearn pipeline
                            
                                OpenCV Python and SIFT features
                            
                                can I nest virtualenvs?
                            
                                Python multiprocessing design
                            
                                Which line is chosen to be reported in exception
                            
                                Python argparse: Mutually exclusive group with some compatible arguments
                            
                                Can `weakref` callbacks replace `__del__`?
                            
                                Pass data frame through Tkinter classes
                            
                                argparse action or type for comma-separated list
                            
                                Why does Pip claim that a version of Python is not in a given range?
                            
                                Is it possible to use Ropemacs with TRAMP in Emacs?
                            
                                Does PyPy work with Py2Exe?
                            
                                Python and OpenCV. How do I detect all (filled)circles/round objects in an image?
                            
                                Matplotlib Backend Differences between Agg and Cairo
                            
                                How can I make setup tools install a github forked PyPI package?
                            
                                Using multiple versions of Python
                            
                                How to get the first value in a python dictionary
                            
                                Pros and cons of 'script' vs. 'entry_point' in Python command line scripts
                            
                                Using Tweepy to listen to stream and search for tweets. How to stop previous search and only listen for new stream?
                            
                                Can we use apps.py for application-level configuration as a contrast to settings.py for project-level configurations?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Multiple sessions and graphs in Tensorflow (in the same process)

Tags:

python

machine-learning

tensorflow

deep-learning

Vikesh

People also ask

1 Answers

Vikesh

Recent Activity

Donate For Us