Suppose vector \theta
is all the parameters in a neural network, I wonder how to compute hessian matrix for \theta
in pytorch.
Suppose the network is as follows:
class Net(Module):
def __init__(self, h, w):
super(Net, self).__init__()
self.c1 = torch.nn.Conv2d(1, 32, 3, 1, 1)
self.f2 = torch.nn.Linear(32 * h * w, 5)
def forward(self, x):
x = self.c1(x)
x = x.view(x.size(0), -1)
x = self.f2(x)
return x
I know the second derivative can be calculated by calling torch.autograd.grad()
twice, but the parameters in pytorch is organized by net.parameters()
, and I don't know how to compute the hessian for all parameters.
I have tried to use torch.autograd.functional.hessian()
in pytorch 1.5 as follows:
import torch
import numpy as np
from torch.nn import Module
import torch.nn.functional as F
class Net(Module):
def __init__(self, h, w):
super(Net, self).__init__()
self.c1 = torch.nn.Conv2d(1, 32, 3, 1, 1)
self.f2 = torch.nn.Linear(32 * h * w, 5)
def forward(self, x):
x = self.c1(x)
x = x.view(x.size(0), -1)
x = self.f2(x)
return x
def func_(a, b c, d):
p = [a, b, c, d]
x = torch.randn(size=[8, 1, 12, 12], dtype=torch.float32)
y = torch.randint(0, 5, [8])
x = F.conv2d(x, p[0], p[1], 1, 1)
x = x.view(x.size(0), -1)
x = F.linear(x, p[2], p[3])
loss = F.cross_entropy(x, y)
return loss
if __name__ == '__main__':
net = Net(12, 12)
h = torch.autograd.functional.hessian(func_, tuple([_ for _ in net.parameters()]))
print(type(h), len(h))
h
is a tuple, and the results are in strange shape. For example, the shape of \frac{\delta Loss^2}{\delta c1.weight^2}
is [32,1,3,3,32,1,3,3]
. It seems like I can combine them into a complete H
, but I don't know which part it is in the whole Hessian Matrix and the corresponding order.
torch. autograd provides classes and functions implementing automatic differentiation of arbitrary scalar valued functions. It requires minimal changes to the existing code - you only need to declare Tensor s for which gradients should be computed with the requires_grad=True keyword.
A Hessian Matrix is square matrix of second-order partial derivatives of a scalar, which describes the local curvature of a multi-variable function. Specifically in case of a Neural Network, the Hessian is a square matrix with the number of rows and columns equal to the total number of parameters in the Neural Network.
Here is one solution, I think it's a little too complex but could be instructive.
Considering about these points:
torch.autograd.functional.hessian()
the first argument must be a function, and the second argument should be a tuple or list of tensors. That means we cannot directly pass a scalar loss to it. (I don't know why, because I think there is no large difference between a scalar loss or a function that returns a scalar)So here is the solution:
import torch
import numpy as np
from torch.nn import Module
import torch.nn.functional as F
class Net(Module):
def __init__(self, h, w):
super(Net, self).__init__()
self.c1 = torch.nn.Conv2d(1, 32, 3, 1, 1)
self.f2 = torch.nn.Linear(32 * h * w, 5)
def forward(self, x):
x = self.c1(x)
x = x.view(x.size(0), -1)
x = self.f2(x)
return x
def haha(a, b, c, d):
p = [a.view(32, 1, 3, 3), b, c.view(5, 32 * 12 * 12), d]
x = torch.randn(size=[8, 1, 12, 12], dtype=torch.float32)
y = torch.randint(0, 5, [8])
x = F.conv2d(x, p[0], p[1], 1, 1)
x = x.view(x.size(0), -1)
x = F.linear(x, p[2], p[3])
loss = F.cross_entropy(x, y)
return loss
if __name__ == '__main__':
net = Net(12, 12)
h = torch.autograd.functional.hessian(haha, tuple([_.view(-1) for _ in net.parameters()]))
# Then we just need to fix tensors in h into a big matrix
I build a new function haha
that works in the same way with the neural network Net
. Notice that arguments a, b, c, d
are all expanded into one-dimensional vectors, so that the shapes of tensors in h
are all two dimensional, in good order and easy to be combined into a large hessian matrix.
In my example, the shapes of tensors in h
is
# with relation to c1.weight and c1.weight, c1.bias, f2.weight, f2.bias
[288,288]
[288,32]
[288,23040]
[288,5]
# with relation to c2.bias and c1.weight, c1.bias, f2.weight, f2.bias
[32, 288]
[32, 32]
[32, 23040]
[32, 5]
...
So it is easy to see the meaning of the tensors and which part it is. All we need to do is to allocate a (288+32+23040+5)*(288+32+23040+5)
matrix and fix the tensors in h
into the corresponding locations.
I think the solution still could be improved, like we don't need to build a function works the same way with neural network, and transform the shape of parameters twice. But for now I don't have better ideas, if there is any better solution, please let me know.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With