Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pytorch: how to add L1 regularizer to activations?

Tags:

python

pytorch

I would like to add the L1 regularizer to the activations output from a ReLU. More generally, how does one add a regularizer only to a particular layer in the network?


Related material:

  • This similar post refers to adding L2 regularization, but it appears to add the regularization penalty to all layers of the network.

  • nn.modules.loss.L1Loss() seems relevant, but I do not yet understand how to use this.

  • The legacy module L1Penalty seems relevant also, but why has it been deprecated?

like image 676
Bull Avatar asked Jun 20 '17 00:06

Bull


2 Answers

Here is how you do this:

  • In your Module's forward return final output and layers' output for which you want to apply L1 regularization
  • loss variable will be sum of cross entropy loss of output w.r.t. targets and L1 penalties.

Here's an example code

import torch from torch.autograd import Variable from torch.nn import functional as F   class MLP(torch.nn.Module):     def __init__(self):         super(MLP, self).__init__()         self.linear1 = torch.nn.Linear(128, 32)         self.linear2 = torch.nn.Linear(32, 16)         self.linear3 = torch.nn.Linear(16, 2)      def forward(self, x):         layer1_out = F.relu(self.linear1(x))         layer2_out = F.relu(self.linear2(layer1_out))         out = self.linear3(layer2_out)         return out, layer1_out, layer2_out  batchsize = 4 lambda1, lambda2 = 0.5, 0.01  model = MLP() optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)  # usually following code is looped over all batches  # but let's just do a dummy batch for brevity  inputs = Variable(torch.rand(batchsize, 128)) targets = Variable(torch.ones(batchsize).long())  optimizer.zero_grad() outputs, layer1_out, layer2_out = model(inputs) cross_entropy_loss = F.cross_entropy(outputs, targets)  all_linear1_params = torch.cat([x.view(-1) for x in model.linear1.parameters()]) all_linear2_params = torch.cat([x.view(-1) for x in model.linear2.parameters()]) l1_regularization = lambda1 * torch.norm(all_linear1_params, 1) l2_regularization = lambda2 * torch.norm(all_linear2_params, 2)  loss = cross_entropy_loss + l1_regularization + l2_regularization loss.backward() optimizer.step() 
like image 161
Sasank Chilamkurthy Avatar answered Sep 24 '22 21:09

Sasank Chilamkurthy


All of the (other current) responses are incorrect in some way as the question is about adding regularization to activation. This one is closest in that it suggests summing the norms of the outputs, which is correct, but the code sums the norms of the weights, which is incorrect.

The correct way is not to modify the network code, but rather to capture the outputs via a forward hook, as in the OutputHook class. From there, the summing of the norms of the outputs is straightforward, but one needs to take care to clear the captured outputs every iteration.

import torch   class OutputHook(list):     """ Hook to capture module outputs.     """     def __call__(self, module, input, output):         self.append(output)   class MLP(torch.nn.Module):     def __init__(self):         super(MLP, self).__init__()         self.linear1 = torch.nn.Linear(128, 32)         self.linear2 = torch.nn.Linear(32, 16)         self.linear3 = torch.nn.Linear(16, 2)         # Instantiate ReLU, so a hook can be registered to capture its output.         self.relu = torch.nn.ReLU()      def forward(self, x):         layer1_out = self.relu(self.linear1(x))         layer2_out = self.relu(self.linear2(layer1_out))         out = self.linear3(layer2_out)         return out   batch_size = 4 l1_lambda = 0.01  model = MLP() optimizer = torch.optim.SGD(model.parameters(), lr=1e-4) # Register hook to capture the ReLU outputs. Non-trivial networks will often # require hooks to be applied more judiciously. output_hook = OutputHook() model.relu.register_forward_hook(output_hook)  inputs = torch.rand(batch_size, 128) targets = torch.ones(batch_size).long()  optimizer.zero_grad() outputs = model(inputs) cross_entropy_loss = torch.nn.functional.cross_entropy(outputs, targets)  # Compute the L1 penalty over the ReLU outputs captured by the hook. l1_penalty = 0. for output in output_hook:     l1_penalty += torch.norm(output, 1) l1_penalty *= l1_lambda  loss = cross_entropy_loss + l1_penalty loss.backward() optimizer.step() output_hook.clear() 
like image 25
ndronen Avatar answered Sep 21 '22 21:09

ndronen