Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can you reverse a PyTorch neural network and activate the inputs from the outputs?

Can we activate the outputs of a NN to gain insight into how the neurons are connected to input features?

If I take a basic NN example from the PyTorch tutorials. Here is an example of a f(x,y) training example.

import torch

N, D_in, H, D_out = 64, 1000, 100, 10

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)

loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-4
for t in range(500):
    y_pred = model(x)
    loss = loss_fn(y_pred, y)
    model.zero_grad()
    loss.backward()
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

After I've finished training the network to predict y from x inputs. Is it possible to reverse the trained NN so that it can now predict x from y inputs?

I don't expect y to match the original inputs that trained the y outputs. So I expect to see what features the model activates on to match x and y.

If it is possible, then how do I rearrange the Sequential model without breaking all the weights and connections?

like image 948
Reactgular Avatar asked Jan 23 '20 12:01

Reactgular


People also ask

Can you run a neural network backwards?

You can definitely run a neural network "in reverse".

How does PyTorch forward work?

The forward function computes output Tensors from input Tensors. The backward function receives the gradient of the output Tensors with respect to some scalar value, and computes the gradient of the input Tensors with respect to that same scalar value.


2 Answers

It is possible but only for very special cases. For a feed-forward network (Sequential) each of the layers needs to be reversible; that means the following arguments apply to each layer separately. The transformation associated with one layer is y = activation(W*x + b) where W is the weight matrix and b the bias vector. In order to solve for x we need to perform the following steps:

  1. Reverse activation; not all activation functions have an inverse though. For example the ReLU function does not have an inverse on (-inf, 0). If we used tanh on the other hand we can use its inverse which is 0.5 * log((1 + x) / (1 - x)).
  2. Solve W*x = inverse_activation(y) - b for x; for a unique solution to exist W must have similar row and column rank and det(W) must be non-zero. We can control the former by choosing a specific network architecture while the latter depends on the training process.

So for a neural network to be reversible it must have a very specific architecture: all layers must have the same number of input and output neurons (i.e. square weight matrices) and the activation functions all need to be invertible.

Code: Using PyTorch we will have to do the inversion of the network manually, both in terms of solving the system of linear equations as well as finding the inverse activation function. Consider the following example of a 1-layer neural network (since the steps apply to each layer separately extending this to more than 1 layer is trivial):

import torch

N = 10  # number of samples
n = 3   # number of neurons per layer

x = torch.randn(N, n)

model = torch.nn.Sequential(
    torch.nn.Linear(n, n), torch.nn.Tanh()
)

y = model(x)

z = y  # use 'z' for the reverse result, start with the model's output 'y'.
for step in list(model.children())[::-1]:
    if isinstance(step, torch.nn.Linear):
        z = z - step.bias[None, ...]
        z = z[..., None]  # 'torch.solve' requires N column vectors (i.e. shape (N, n, 1)).
        z = torch.solve(z, step.weight)[0]
        z = torch.squeeze(z)  # remove the extra dimension that we've added for 'torch.solve'.
    elif isinstance(step, torch.nn.Tanh):
        z = 0.5 * torch.log((1 + z) / (1 - z))

print('Agreement between x and z: ', torch.dist(x, z))
like image 122
a_guest Avatar answered Oct 02 '22 16:10

a_guest


If I've understood correctly, there are two questions here:

  1. Is it possible to determine what features in the input have activated neurons?

  2. If so, is it possible to use this information to generate samples from p(x|y)?

Regarding 1, a basic way to determine if a neuron is sensitive to an input feature x_i is to compute the gradient of this neuron's output w.r.t x_i. A high gradient will indicate sensitivity to a particular input element. There is a rich literature on the subject, for example, you can have a look at guided backpropagation or at GradCam (the latter is about classification with convnets, but it does contain useful ideas).

As for 2, I don't think that your approach to "reversing the problem" is correct. The problem is that your network is discriminative and what it outputs can be seen as argmax_y p(y|x). Note that this is a point-wise estimation, not a full modeling of the distribution. However, the inverse problem that you're interested in seems to be sampling from

p(x|y)=constant*p(y|x)p(x).

You don't know how to sample from p(y|x) and you don't know anything about p(x). Even if you use a method to discover correlations between the neurons and specific input features, you have only discovered which features where more important to the networks prediction, but depending on the nature of y this might be insufficiant. Consider a toy example where your inputs x are 2d points distributed according to some distribution in R^2 and where the output y is binary, such that any (a,b) in R^2 is classified as 1 if a<1 and it is classified as 0 if a>1. Then a discriminative network could learn the vertical line x=1 as its decision boundary. Inspecting correlations between neurons and input features will reveal that only the first coordinate was useful in this prediction, but this information is not sufficient for sampling from the full 2d distribution of inputs.

I think that Variational autoencoders could be what you're looking for.

like image 41
Ash Avatar answered Oct 02 '22 16:10

Ash