When and why do we use tf.reduce_mean?

Question

In setting up the model I sometimes see the code:

# Scenario 1
# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=logits, labels=Y))

or

# Scenario 2
# Evaluate model (with test logits, for dropout to be disabled)
prediction = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(prediction, tf.float32))

The definition of tf.reduce_mean states that it "calculates the mean of tensor elements along various dimensions of the tensor." I am confused about what it does in simpler language? When do we need to use it, maybe with reference to # Scenario 1 & 2 ? Thank you

meTchaikovsky · Accepted Answer

As far as I understand, tensorflow.reduce_mean is the same as numpy.mean. It creates an operation in the underlying tensorflow graph which computes the mean of a tensor.

The most important keyword argument of tensorflow.reduce_mean is axis. Basically, if you have a tensor with shape (4, 3, 2) and axis=1, an empty array with shape (4, 2) will be created, and the mean values along the selected axis will be computed to fill in the empty array. (This is just a pseudo-process to help you make sense of the output, but may not be the actual process)

Here is a simple example to help you understand

import tensorflow as tf
import numpy as np

one = np.linspace(1, 30, 30).reshape(5, 3, 2)

x = tf.placeholder('float32', shape=[5, 3, 2])
op_1 = tf.reduce_mean(x)
op_2 = tf.reduce_mean(x, axis=0)
op_3 = tf.reduce_mean(x, axis=1)
op_4 = tf.reduce_mean(x, axis=2)

with tf.Session() as sess:
    print(sess.run(op_1, feed_dict={x: one}))
    print(sess.run(op_2, feed_dict={x: one}))
    print(sess.run(op_3, feed_dict={x: one}))
    print(sess.run(op_4, feed_dict={x: one}))

The first output is a number because we didn't provide an axis. The shapes of the rest of the outputs are (3, 2), (5, 2) and (5, 3), respectively.

reduce_mean can be useful when the target value is a matrix.

Ruthvik Vaila · Answer

User @meTchaikovsky explained the general case of tf.reduce_mean. In both of your cases tf.reduce_mean simply works as any mean calculator i.e,. you're not taking mean along any particular axis of a tensor, you simply divide the sum of the elements in a tensor by number of elements.

Let's decode what exactly is happening in both the cases. For the both the cases assume batch_size = 2 and num_classes = 5, meaning that there are two examples per batch. Now for the first case, tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y) returns an array of shape (2,).

>>import numpy as np
>>import tensorflow as tf
>>sess= tf.InteractiveSession()

>>batch_size = 2
>>num_classes = 5
>>logits = np.random.rand(batch_size,num_classes) 
>>print(logits)
[[0.94108451 0.68186329 0.04000461 0.25996487 0.50391948]
 [0.22781201 0.32305269 0.93359371 0.22599208 0.05942905]]
>>labels = np.array([[1,0,0,0,0],[0,1,0,0,0]])
>>print(labels)
[[1 0 0 0 0]
 [0 1 0 0 0]]
>>logits_ = tf.placeholder(dtype=tf.float32,shape=(batch_size,num_classes))
>>Y_ = tf.placeholder(dtype=tf.int32,shape=(batch_size,num_classes))
>>loss_op = tf.nn.softmax_cross_entropy_with_logits(logits=logits_, labels=Y_)
>>loss_per_example = sess.run(loss_op,feed_dict={Y_:labels,logits_:logits})
>>print(loss_per_example)
array([1.2028817, 1.6912657], dtype=float32)

You can see that loss_per_example is of shape (2,). If we take the mean of this variable then we can approximate the average loss for the full batch. Hence we calculate

>>loss_per_example_holder = tf.placeholder(dtype=tf.float32,shape=(batch_size))
>>final_loss_per_batch = tf.reduce_mean(loss_per_example_holder)
>>final_loss = sess.run(final_loss_per_batch,feed_dict={loss_per_example_holder:loss_per_example})  
>>print(final_loss)
1.4470737

Coming to your second case:

>>predictions_holder = tf.placeholder(dtype=tf.float32,shape=(batch_size,num_classes))
>>labels_holder = tf.placeholder(dtype=tf.int32,shape=(batch_size,num_classes))
>>prediction_tf = tf.equal(tf.argmax(predictions_holder, 1), tf.argmax(labels_holder, 1))
>>labels_match = sess.run(prediction_tf,feed_dict={predictions_holder:logits,labels_holder:labels})
>>print(labels_match)
[ True False]

The above output was expected because only the first example of the variable logits says that the neuron with highest activation (0.9410) is zeroth which is same as labels. Now we want to calculate the accuracy, which means we have to take the average of the variable labels_match.

>>labels_match_holder = tf.placeholder(dtype=tf.float32,shape=(batch_size))
>>accuracy_calc = tf.reduce_mean(tf.cast(labels_match_holder, tf.float32))
>>accuracy = sess.run(accuracy_calc, feed_dict={labels_match_holder:labels_match})
>>print(accuracy)
0.5

When and why do we use tf.reduce_mean?

Tags:

python

tensorflow

nilsinelabore

2 Answers

meTchaikovsky

Ruthvik Vaila

Recent Activity

Donate For Us

When and why do we use tf.reduce_mean?

Tags:

python

tensorflow

nilsinelabore

2 Answers

meTchaikovsky

Ruthvik Vaila

Related questions

Recent Activity

Donate For Us