Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does tensorflow propagate gradients through a pdf

Tags:

tensorflow

Lets say, a distribution function is defined as below:

dist = tf.contrib.distributions.Normal(mu, sigma)

and a sample is drawn from the distribution

val = dist.pdf(x)

and this value is used in a model to predict a variable

X_hat = f(val)
loss = tf.norm(X_pred-X_hat, ord=2)

and if I want to optimize the variables mu and sigma to reduce my prediction error can I do the following?

train = tf.train.AdamOptimizer(1e-03).minimize(loss, var_list=[mu, sigma])

I am interested in knowing if the gradient routines are propagated through the normal distribution, or should I expect some issues because I am taking gradients over the parameters defining a distribution

like image 217
knk Avatar asked Apr 08 '18 20:04

knk


People also ask

How does TensorFlow calculate gradient?

TensorFlow "records" relevant operations executed inside the context of a tf. GradientTape onto a "tape". TensorFlow then uses that tape to compute the gradients of a "recorded" computation using reverse mode differentiation.

Is TF gather differentiable?

It's only differentiable w.r.t. self. y but not the integer/discrete elements of self. actions_array.

What is gradient TensorFlow?

The gradients are the partial derivatives of the loss with respect to each of the six variables. TensorFlow presents the gradient and the variable of which it is the gradient, as members of a tuple inside a list. We display the shapes of each of the gradients and variables to check that is actually the case.


1 Answers

tl;dr: Yes, gradient back propagation will work correctly with tf.distributions.Normal.

dist.pdf(x) does not draw a sample from the distribution, but rather returns the probability density function at x. This is probably not what you wanted.

To get a random sample, what you really want is to call dist.sample(). For many random distributions, the dependency of a random sample on the parameters is nontrivial and will not necessarily be backpropable.

However, as @Richard_wth pointed out, specifically for the normal distribution, it is possible through reparametrization to get a simple dependency on the location and scale parameters (mu and sigma).

In fact, in the implementation of tf.contrib.distributions.Normal (recently migrated to tf.distributions.Normal), that is exactly how sample is implemented:

def _sample_n(self, n, seed=None):
  ...
  sampled = random_ops.random_normal(shape=shape, mean=0., stddev=1., ...)
  return sampled * self.scale + self.loc

Consequently, if you provide scale and location parameters as tensors, then backpropagation will work correctly on those tensors.

Note that this backpropagation is inherently random: It will vary depending on the random draw of the normal Gaussian variable. However, in the long run (over many training examples), this is likely to work as you expect.

like image 141
Zvika Avatar answered Oct 21 '22 01:10

Zvika