How to compute hessian matrix for all parameters in a network in pytorch?

Tags:

Suppose vector \theta is all the parameters in a neural network, I wonder how to compute hessian matrix for \theta in pytorch.

Suppose the network is as follows:

class Net(Module):
    def __init__(self, h, w):
        super(Net, self).__init__()
        self.c1 = torch.nn.Conv2d(1, 32, 3, 1, 1)
        self.f2 = torch.nn.Linear(32 * h * w, 5)

    def forward(self, x):
        x = self.c1(x)
        x = x.view(x.size(0), -1)
        x = self.f2(x)
        return x

I know the second derivative can be calculated by calling torch.autograd.grad() twice, but the parameters in pytorch is organized by net.parameters(), and I don't know how to compute the hessian for all parameters.

I have tried to use torch.autograd.functional.hessian() in pytorch 1.5 as follows:

import torch
import numpy as np
from torch.nn import Module
import torch.nn.functional as F


class Net(Module):
    def __init__(self, h, w):
        super(Net, self).__init__()
        self.c1 = torch.nn.Conv2d(1, 32, 3, 1, 1)
        self.f2 = torch.nn.Linear(32 * h * w, 5)

    def forward(self, x):
        x = self.c1(x)
        x = x.view(x.size(0), -1)
        x = self.f2(x)
        return x


def func_(a, b c, d):
    p = [a, b, c, d]
    x = torch.randn(size=[8, 1, 12, 12], dtype=torch.float32)
    y = torch.randint(0, 5, [8])
    x = F.conv2d(x, p[0], p[1], 1, 1)
    x = x.view(x.size(0), -1)
    x = F.linear(x, p[2], p[3])
    loss = F.cross_entropy(x, y)
    return loss


if __name__ == '__main__':
    net = Net(12, 12)

    h = torch.autograd.functional.hessian(func_, tuple([_ for _ in net.parameters()]))
    print(type(h), len(h))

h is a tuple, and the results are in strange shape. For example, the shape of \frac{\delta Loss^2}{\delta c1.weight^2} is [32,1,3,3,32,1,3,3]. It seems like I can combine them into a complete H, but I don't know which part it is in the whole Hessian Matrix and the corresponding order.

816

asked Sep 23 '20 08:09

david

1 Answers

Here is one solution, I think it's a little too complex but could be instructive.

Considering about these points:

First, about torch.autograd.functional.hessian() the first argument must be a function, and the second argument should be a tuple or list of tensors. That means we cannot directly pass a scalar loss to it. (I don't know why, because I think there is no large difference between a scalar loss or a function that returns a scalar)
Second, I want to obtain a complete Hessian matrix, which is the second derivative of all parameters, and it should be in an appropriate order.

So here is the solution:

import torch
import numpy as np
from torch.nn import Module
import torch.nn.functional as F

class Net(Module):
    def __init__(self, h, w):
        super(Net, self).__init__()
        self.c1 = torch.nn.Conv2d(1, 32, 3, 1, 1)
        self.f2 = torch.nn.Linear(32 * h * w, 5)

    def forward(self, x):
        x = self.c1(x)
        x = x.view(x.size(0), -1)
        x = self.f2(x)
        return x

def haha(a, b, c, d):
    p = [a.view(32, 1, 3, 3), b, c.view(5, 32 * 12 * 12), d]
    x = torch.randn(size=[8, 1, 12, 12], dtype=torch.float32)
    y = torch.randint(0, 5, [8])
    x = F.conv2d(x, p[0], p[1], 1, 1)
    x = x.view(x.size(0), -1)
    x = F.linear(x, p[2], p[3])
    loss = F.cross_entropy(x, y)
    return loss


if __name__ == '__main__':
    net = Net(12, 12)

    h = torch.autograd.functional.hessian(haha, tuple([_.view(-1) for _ in net.parameters()]))
    
    # Then we just need to fix tensors in h into a big matrix

I build a new function haha that works in the same way with the neural network Net. Notice that arguments a, b, c, d are all expanded into one-dimensional vectors, so that the shapes of tensors in h are all two dimensional, in good order and easy to be combined into a large hessian matrix.

In my example, the shapes of tensors in h is

# with relation to c1.weight and c1.weight, c1.bias, f2.weight, f2.bias
[288,288]
[288,32]
[288,23040]
[288,5]

# with relation to c2.bias and c1.weight, c1.bias, f2.weight, f2.bias
[32, 288]
[32, 32]
[32, 23040]
[32, 5]
...

So it is easy to see the meaning of the tensors and which part it is. All we need to do is to allocate a (288+32+23040+5)*(288+32+23040+5) matrix and fix the tensors in h into the corresponding locations.

I think the solution still could be improved, like we don't need to build a function works the same way with neural network, and transform the shape of parameters twice. But for now I don't have better ideas, if there is any better solution, please let me know.

128

answered Nov 14 '22 04:11

david

Related questions
                            
                                Keras LSTM input dimensions with one hot text embedding
                            
                                One to many LSTM in Keras
                            
                                Classify words with the same meaning
                            
                                Does LSTM in Keras support dynamic sentence length or not?
                            
                                Inception5h vs Inception V4, what is 5h
                            
                                hierarchical classification in sklearn [closed]
                            
                                How to structure Tensorflow model code?
                            
                                Scikit-Learn Decision Tree: Probability of prediction being a or b?
                            
                                Why do I need to initialize variables in TensorFlow?
                            
                                Tensorflow can't detect GPU when invoked by Ray worker
                            
                                Non-linear multivariate time-series response prediction using RNN
                            
                                What is the difference between energy function and loss function? [closed]
                            
                                AttributeError when training CNN 1D with Python Keras
                            
                                Error in loading the model with load_weights in Keras
                            
                                How to calculate feature importance in each models of cross validation in sklearn
                            
                                Using Hyper-parameters from H2O to re-build XGBoost in Sklearn gives Difference Performance in Python
                            
                                Improve real-life results of neural network trained with mnist dataset
                            
                                Count number of the blues lines on white background in the image
                            
                                How to reset Keras metrics?
                            
                                OpenAI GPT-2 model use with TensorFlow JS

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to compute hessian matrix for all parameters in a network in pytorch?

Tags:

machine-learning

pytorch

hessian

david

People also ask

1 Answers

david

Recent Activity

Donate For Us