tf.reduce_sum on GPU fails in combination with placeholder as input shape

Question

UPDATE: Fixed in Tensorflow 1.14.0 (maybe earlier, didn't check)

UPDATE: Still occurring in Tensorflow 1.7.0

UPDATE: I wrote a collab notebook that reproduces this bug on google's gpu hardware: https://drive.google.com/file/d/13V87kSTyyFVMM7NoJNk9QTsCYS7FRbyz/view?usp=sharing

UPDATE: After wrongly accusing tf.gather in the first revisions of this question I now narrowed it down to tf.reduce_sum in combination with a placeholder as shape:

tf.reduce_sum produces zeros (on GPU only) for large tensors whose shape depends on a placeholder.

Running the following code while feeding a large integer to placeholder batch_size (>700000 in my case):

import tensorflow as tf
import numpy as np

graph = tf.Graph()
with graph.as_default():
    batch_size = tf.placeholder(tf.int32,shape=[])
    ones_with_placeholder = tf.ones([batch_size,256,4])
    sum_out = tf.reduce_sum(ones_with_placeholder,axis=2)
    min_sum_out = tf.reduce_min(sum_out)

sess = tf.Session(graph=graph)

sum_result,min_sum_result = sess.run([sum_out,min_sum_out],feed_dict={batch_size: 1000000})
print("Min value in sum_out processed on host with numpy:", np.min(sum_result))
print("Min value in sum_out tensor processed in graph with tf:", min_sum_result)

The following, wrong result is shown:

Min value in sum_out processed on host with numpy: 0.0
Min value in sum_out tensor processed in graph with tf: 0.0

I was expecting that applying reduce_sum over axis 2 should result in 4.0 everywhere!

Running this exact code on CPU leads to correct results. Also running this with a fixed shape for tf.ones leads to the correct results on both CPU and GPU:

ones_with_fixed_shape = tf.ones([1000000,256,4])
sum_out = tf.reduce_sum(ones_with_fixed_shape,axis=2)

What is the problem with the placeholder on GPU?

Prakhar Agarwal · Accepted Answer

The basic problem is that there's a speed/accuracy tradeoff. Even though your example seems trivial, with the entire tensor initialized to 1, there are 1.024B entries. Note that int32 can represent integral numbers in the range [-2,147,483,648 to 2,147,483,647] without loss of precison:

So we expect to see some error if we accumulate all of the entries and perform computation. This also explains why smaller matrices didn't exhibit the problem(smaller Batch size).

tf.reduce_sum on GPU fails in combination with placeholder as input shape

Tags:

sdnr

1 Answers

Prakhar Agarwal

Recent Activity

Donate For Us

tf.reduce_sum on GPU fails in combination with placeholder as input shape

Tags:

sdnr

1 Answers

Prakhar Agarwal

Related questions

Recent Activity

Donate For Us