Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When does Tensorflow update weights and biases?

When does tensorflow update weights and biases in the for loop?

Below is the code from tf's github. mnist_softmax.py

for _ in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
  1. When does tensorflow update weights and biases?
  2. Does it update them when running sess.run()? If so, Does it mean, in this program, tf update weights and biases 1000 times?
  3. Or does it update them after finishing the whole for loop?
  4. If 2. is correct, my next question is, does tf update the model using different training data every time (since it uses next_batch(100). There are 1000*100 training data points in total. But all data points are considered only once individually. Am I correct or did I misunderstand something?
  5. If 3. is correct, is it weird that after just one update step the model had been trained? I think I must be misunderstanding something, It would be really great if anyone can give me a hint or refer some material.
like image 770
Lion Lai Avatar asked Feb 24 '17 11:02

Lion Lai


1 Answers

  1. It updates weights every time you run the train_step.
  2. Yes, it is updating the weights 1000 times in this program.
  3. See above
  4. Yes, you are correct, it loads a mini-batch containing 100 points at once and uses it to compute gradients.
  5. It's not weird at all. You don't necessarily need to see the same data again and again, all that is required is that you have enough data for the network to converge. You can iterate multiple times over the same data if you want, but since this model doesn't have many parameters, it converges in a single epoch.

Tensorflow works by creating a graph of the computations that are required for computing the output of a network. Each of the basic operations like matrix multiplication, addition, anything you can think of are nodes in this computation graph. In the tensorflow mnist example that you are following, the lines from 40-46 define the network architecture

  • x: placeholder
  • y_: placeholder
  • W: Variable - This is learnt during training
  • b: Variable - This is also learnt during training

The network represents a simple linear regression model where the prediction is made using y = W*x + b (see line 43).

Next, you configure the training procedure for your network. This code uses cross-entropy as the loss function to minimize (see line 57). The minimization is done using the gradient descent algorithm (see line 59).

At this point, your network is fully constructed. Now you need to run these nodes so that actual computation if performed (no computation has been performed up till this point).

In the loop where sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) is executed, tf computes the value of train_step which causes the GradientDescentOptimizer to try to minimize the cross_entropy and this is how training progresses.

like image 82
lakshayg Avatar answered Sep 21 '22 13:09

lakshayg