Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Differentiable round function in Tensorflow?

So the output of my network is a list of propabilities, which I then round using tf.round() to be either 0 or 1, this is crucial for this project. I then found out that tf.round isn't differentiable so I'm kinda lost there.. :/

like image 707
ronsap123 Avatar asked Oct 06 '17 00:10

ronsap123


4 Answers

Something along the lines of x - sin(2pi x)/(2pi)?

I'm sure there's a way to squish the slope to be a bit steeper.

enter image description here

like image 92
Tiana Avatar answered Nov 17 '22 19:11

Tiana


You can use the fact that tf.maximum() and tf.minimum() are differentiable, and the inputs are probabilities from 0 to 1

# round numbers less than 0.5 to zero;
# by making them negative and taking the maximum with 0
differentiable_round = tf.maximum(x-0.499,0)
# scale the remaining numbers (0 to 0.5) to greater than 1
# the other half (zeros) is not affected by multiplication
differentiable_round = differentiable_round * 10000
# take the minimum with 1
differentiable_round = tf.minimum(differentiable_round, 1)

Example:

[0.1,       0.5,     0.7]
[-0.0989, 0.001, 0.20099] # x - 0.499
[0,       0.001, 0.20099] # max(x-0.499, 0)
[0,          10,  2009.9] # max(x-0.499, 0) * 10000
[0,         1.0,     1.0] # min(max(x-0.499, 0) * 10000, 1)
like image 11
icerider Avatar answered Nov 17 '22 19:11

icerider


This works for me:

x_rounded_NOT_differentiable = tf.round(x)
x_rounded_differentiable = x - tf.stop_gradient(x - x_rounded_NOT_differentiable)
like image 6
Tom Dörr Avatar answered Nov 17 '22 20:11

Tom Dörr


Rounding is a fundamentally nondifferentiable function, so you're out of luck there. The normal procedure for this kind of situation is to find a way to either use the probabilities, say by using them to calculate an expected value, or by taking the maximum probability that is output and choose that one as the network's prediction. If you aren't using the output for calculating your loss function though, you can go ahead and just apply it to the result and it doesn't matter if it's differentiable. Now, if you want an informative loss function for the purpose of training the network, maybe you should consider whether keeping the output in the format of probabilities might actually be to your advantage (it will likely make your training process smoother)- that way you can just convert the probabilities to actual estimates outside of the network, after training.

like image 4
deftfyodor Avatar answered Nov 17 '22 20:11

deftfyodor