Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running Tensorflow graph multiple times over different input parameters: what kind of loop is efficient?

For my particular problem, I need to re-run the once-constructed Tensorflow graph multiple times, each time re-initializing the variables to new values. Each execution of the graph is independent of the next. Think of it as setting up a model, and then training it 30 independent times with random initialisation per simulation. While I can achieve the above by placing my Session.run() statements inside of a for loop, I do not think that guarantees parallelism.

So the question is: What would be the most appropriate, Tensorflow-compatible way to run multiple independent sims? Should I do session.run() inside a python while loop, or should I perhaps employ the Tensorflow while_loop structure?

like image 331
anna-earwen Avatar asked Nov 08 '22 13:11

anna-earwen


1 Answers

This is an interesting question, and I'm woking with ensembles of models myself.

First of all, training models in a loop does so in series; neither Python loops or tf.while_loop will give you any parallelism across instances. That being said, (tf.while_loop combined with tf.slice can be a very efficient way to process minibatches of data, albeit in serial).

One place to start would be looking at the "Distributed TensorFlow" guide (https://www.tensorflow.org/deploy/distributed). There's an extensive discussion of how to generate multiple concurrent (asynchronous) sessions. You can create sessions on the same device or across the network. I think this is capable of the the kind of parallelism you have in mind (what the author calls "Between Graph Replication" with "Asynchronous Training").

A second idea would be to stack your instances within a single model by adding and additional "instance" dimension to all your tensors. You'd also have to similarly stack your training data input (via feed-dict, queue, or dataset). You'd need to take particular care not to cross-link nodes across the individual instances (i.e. the new instance dimensions), but you could train them simultaneously by computing a shared cost function and using a standard optimizer. When optimized, each component of your weights and values in the instance dimension would represent a single simple model. You could additionally have an op to compute the vector of individual costs to monitor while training. This may not speed things up on your cpu (it could take number of instances * single instance time), but if you have a gpu, it might have a great deal of additional parallel capacity for these expanded matrix ops in a similar number of cycles as your single training session.

Looking forward to hearing about what works for you!

UPDATE:

I was apparently incorrect about parallelism, in tf_while_loop. From (https://www.tensorflow.org/api_docs/python/tf/while_loop:

while_loop implements non-strict semantics, enabling multiple iterations to run in parallel. The maximum number of parallel iterations can be controlled by parallel_iterations, which gives users some control over memory consumption and execution order. For correct programs, while_loop should return the same result for any parallel_iterations > 0.

like image 195
Joshua R. Avatar answered Nov 15 '22 05:11

Joshua R.