In setting up the model I sometimes see the code:
# Scenario 1
# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits=logits, labels=Y))
or
# Scenario 2
# Evaluate model (with test logits, for dropout to be disabled)
prediction = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(prediction, tf.float32))
The definition of tf.reduce_mean states that it "calculates the mean of tensor elements along various dimensions of the tensor." I am confused about what it does in simpler language? When do we need to use it, maybe with reference to # Scenario 1 & 2 ? Thank you
As far as I understand, tensorflow.reduce_mean is the same as numpy.mean. It creates an operation in the underlying tensorflow graph which computes the mean of a tensor.
The most important keyword argument of tensorflow.reduce_mean is axis. Basically, if you have a tensor with shape (4, 3, 2) and axis=1, an empty array with shape (4, 2) will be created, and the mean values along the selected axis will be computed to fill in the empty array. (This is just a pseudo-process to help you make sense of the output, but may not be the actual process)
Here is a simple example to help you understand
import tensorflow as tf
import numpy as np
one = np.linspace(1, 30, 30).reshape(5, 3, 2)
x = tf.placeholder('float32', shape=[5, 3, 2])
op_1 = tf.reduce_mean(x)
op_2 = tf.reduce_mean(x, axis=0)
op_3 = tf.reduce_mean(x, axis=1)
op_4 = tf.reduce_mean(x, axis=2)
with tf.Session() as sess:
print(sess.run(op_1, feed_dict={x: one}))
print(sess.run(op_2, feed_dict={x: one}))
print(sess.run(op_3, feed_dict={x: one}))
print(sess.run(op_4, feed_dict={x: one}))
The first output is a number because we didn't provide an axis. The shapes of the rest of the outputs are (3, 2), (5, 2) and (5, 3), respectively.
reduce_mean can be useful when the target value is a matrix.
User @meTchaikovsky explained the general case of tf.reduce_mean. In both of your cases tf.reduce_mean simply works as any mean calculator i.e,. you're not taking mean along any particular axis of a tensor, you simply divide the sum of the elements in a tensor by number of elements.
Let's decode what exactly is happening in both the cases. For the both the cases assume batch_size = 2 and num_classes = 5, meaning that there are two examples per batch.
Now for the first case, tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y) returns an array of shape (2,).
>>import numpy as np
>>import tensorflow as tf
>>sess= tf.InteractiveSession()
>>batch_size = 2
>>num_classes = 5
>>logits = np.random.rand(batch_size,num_classes)
>>print(logits)
[[0.94108451 0.68186329 0.04000461 0.25996487 0.50391948]
[0.22781201 0.32305269 0.93359371 0.22599208 0.05942905]]
>>labels = np.array([[1,0,0,0,0],[0,1,0,0,0]])
>>print(labels)
[[1 0 0 0 0]
[0 1 0 0 0]]
>>logits_ = tf.placeholder(dtype=tf.float32,shape=(batch_size,num_classes))
>>Y_ = tf.placeholder(dtype=tf.int32,shape=(batch_size,num_classes))
>>loss_op = tf.nn.softmax_cross_entropy_with_logits(logits=logits_, labels=Y_)
>>loss_per_example = sess.run(loss_op,feed_dict={Y_:labels,logits_:logits})
>>print(loss_per_example)
array([1.2028817, 1.6912657], dtype=float32)
You can see that loss_per_example is of shape (2,). If we take the mean of this variable then we can approximate the average loss for the full batch. Hence we calculate
>>loss_per_example_holder = tf.placeholder(dtype=tf.float32,shape=(batch_size))
>>final_loss_per_batch = tf.reduce_mean(loss_per_example_holder)
>>final_loss = sess.run(final_loss_per_batch,feed_dict={loss_per_example_holder:loss_per_example})
>>print(final_loss)
1.4470737
Coming to your second case:
>>predictions_holder = tf.placeholder(dtype=tf.float32,shape=(batch_size,num_classes))
>>labels_holder = tf.placeholder(dtype=tf.int32,shape=(batch_size,num_classes))
>>prediction_tf = tf.equal(tf.argmax(predictions_holder, 1), tf.argmax(labels_holder, 1))
>>labels_match = sess.run(prediction_tf,feed_dict={predictions_holder:logits,labels_holder:labels})
>>print(labels_match)
[ True False]
The above output was expected because only the first example of the variable logits says that the neuron with highest activation (0.9410) is zeroth which is same as labels. Now we want to calculate the accuracy, which means we have to take the average of the variable labels_match.
>>labels_match_holder = tf.placeholder(dtype=tf.float32,shape=(batch_size))
>>accuracy_calc = tf.reduce_mean(tf.cast(labels_match_holder, tf.float32))
>>accuracy = sess.run(accuracy_calc, feed_dict={labels_match_holder:labels_match})
>>print(accuracy)
0.5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With