UPDATE: Fixed in Tensorflow 1.14.0 (maybe earlier, didn't check)
UPDATE: Still occurring in Tensorflow 1.7.0
UPDATE: I wrote a collab notebook that reproduces this bug on google's gpu hardware: https://drive.google.com/file/d/13V87kSTyyFVMM7NoJNk9QTsCYS7FRbyz/view?usp=sharing
UPDATE:
After wrongly accusing tf.gather
in the first revisions of this question I now narrowed it down to tf.reduce_sum
in combination with a placeholder as shape:
tf.reduce_sum
produces zeros (on GPU only) for large tensors whose shape depends on a placeholder.
Running the following code while feeding a large integer to placeholder batch_size
(>700000 in my case):
import tensorflow as tf
import numpy as np
graph = tf.Graph()
with graph.as_default():
batch_size = tf.placeholder(tf.int32,shape=[])
ones_with_placeholder = tf.ones([batch_size,256,4])
sum_out = tf.reduce_sum(ones_with_placeholder,axis=2)
min_sum_out = tf.reduce_min(sum_out)
sess = tf.Session(graph=graph)
sum_result,min_sum_result = sess.run([sum_out,min_sum_out],feed_dict={batch_size: 1000000})
print("Min value in sum_out processed on host with numpy:", np.min(sum_result))
print("Min value in sum_out tensor processed in graph with tf:", min_sum_result)
The following, wrong result is shown:
Min value in sum_out processed on host with numpy: 0.0
Min value in sum_out tensor processed in graph with tf: 0.0
I was expecting that applying reduce_sum
over axis 2 should result in 4.0 everywhere!
Running this exact code on CPU leads to correct results. Also running this with a fixed shape for tf.ones leads to the correct results on both CPU and GPU:
ones_with_fixed_shape = tf.ones([1000000,256,4])
sum_out = tf.reduce_sum(ones_with_fixed_shape,axis=2)
What is the problem with the placeholder on GPU?
The basic problem is that there's a speed/accuracy tradeoff. Even though your example seems trivial, with the entire tensor initialized to 1, there are 1.024B entries. Note that int32 can represent integral numbers in the range [-2,147,483,648 to 2,147,483,647] without loss of precison:
So we expect to see some error if we accumulate all of the entries and perform computation. This also explains why smaller matrices didn't exhibit the problem(smaller Batch size).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With