Lets say, a distribution function is defined as below:
dist = tf.contrib.distributions.Normal(mu, sigma)
and a sample is drawn from the distribution
val = dist.pdf(x)
and this value is used in a model to predict a variable
X_hat = f(val)
loss = tf.norm(X_pred-X_hat, ord=2)
and if I want to optimize the variables mu and sigma to reduce my prediction error can I do the following?
train = tf.train.AdamOptimizer(1e-03).minimize(loss, var_list=[mu, sigma])
I am interested in knowing if the gradient routines are propagated through the normal distribution, or should I expect some issues because I am taking gradients over the parameters defining a distribution
TensorFlow "records" relevant operations executed inside the context of a tf. GradientTape onto a "tape". TensorFlow then uses that tape to compute the gradients of a "recorded" computation using reverse mode differentiation.
It's only differentiable w.r.t. self. y but not the integer/discrete elements of self. actions_array.
The gradients are the partial derivatives of the loss with respect to each of the six variables. TensorFlow presents the gradient and the variable of which it is the gradient, as members of a tuple inside a list. We display the shapes of each of the gradients and variables to check that is actually the case.
tl;dr: Yes, gradient back propagation will work correctly with tf.distributions.Normal
.
dist.pdf(x)
does not draw a sample from the distribution, but rather returns the probability density function at x
. This is probably not what you wanted.
To get a random sample, what you really want is to call dist.sample()
. For many random distributions, the dependency of a random sample on the parameters is nontrivial and will not necessarily be backpropable.
However, as @Richard_wth pointed out, specifically for the normal distribution, it is possible through reparametrization to get a simple dependency on the location and scale parameters (mu
and sigma
).
In fact, in the implementation of tf.contrib.distributions.Normal
(recently migrated to tf.distributions.Normal
), that is exactly how sample
is implemented:
def _sample_n(self, n, seed=None):
...
sampled = random_ops.random_normal(shape=shape, mean=0., stddev=1., ...)
return sampled * self.scale + self.loc
Consequently, if you provide scale and location parameters as tensors, then backpropagation will work correctly on those tensors.
Note that this backpropagation is inherently random: It will vary depending on the random draw of the normal Gaussian variable. However, in the long run (over many training examples), this is likely to work as you expect.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With