Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why sometimes tensorflow runs slower and slower with the process of training?

Tags:

tensorflow

I train a RNN network, the first epoch used 7.5 hours. But with the training process runs, tensorflow runs slower and slower, the second epoch used 55 hours. I checked the code, most APIs that become slower with time are these :

  1. session.run([var1, var1, ...], feed_dict=feed),
  2. tensor.eval(feed_dict=feed).

For example, one line code is session.run[var1, var2, ...], feed_dict=feed), as the program begins, It uses 0.1 seconds, but with the process runs, the time used for this line of code becomes bigger and bigger, After 10 hours, time this line spends comes to 10 seconds.

I have been befall this several times. Which triggered this? How could I do to avoid this?

If this line of code: self.shapes = [numpy.zeros(g[1].get_shape(), numy.float32) for g in self.compute_gradients] adds nodes to the graph of tensorflow? I suspect this maybe the reason. This line of code will be called many times periodically,and self is not an object of tf.train.optimizer.

like image 762
HY G Avatar asked Aug 22 '16 02:08

HY G


1 Answers

Try finalizing your graph after you create it (graph.finalize()). This will prevent operations to be added to the graph. I also think self.compute_gradients is adding operations to the graph. Try defining the operation outside your loop and running it inside your loop

like image 69
Vincent Renkens Avatar answered Oct 16 '22 20:10

Vincent Renkens