I would like to add the L1 regularizer to the activations output from a ReLU. More generally, how does one add a regularizer only to a particular layer in the network?
Related material:
This similar post refers to adding L2 regularization, but it appears to add the regularization penalty to all layers of the network.
nn.modules.loss.L1Loss()
seems relevant, but I do not yet understand how to use this.
The legacy module L1Penalty
seems relevant also, but why has it been deprecated?
Here is how you do this:
loss
variable will be sum of cross entropy loss of output w.r.t. targets and L1 penalties.Here's an example code
import torch from torch.autograd import Variable from torch.nn import functional as F class MLP(torch.nn.Module): def __init__(self): super(MLP, self).__init__() self.linear1 = torch.nn.Linear(128, 32) self.linear2 = torch.nn.Linear(32, 16) self.linear3 = torch.nn.Linear(16, 2) def forward(self, x): layer1_out = F.relu(self.linear1(x)) layer2_out = F.relu(self.linear2(layer1_out)) out = self.linear3(layer2_out) return out, layer1_out, layer2_out batchsize = 4 lambda1, lambda2 = 0.5, 0.01 model = MLP() optimizer = torch.optim.SGD(model.parameters(), lr=1e-4) # usually following code is looped over all batches # but let's just do a dummy batch for brevity inputs = Variable(torch.rand(batchsize, 128)) targets = Variable(torch.ones(batchsize).long()) optimizer.zero_grad() outputs, layer1_out, layer2_out = model(inputs) cross_entropy_loss = F.cross_entropy(outputs, targets) all_linear1_params = torch.cat([x.view(-1) for x in model.linear1.parameters()]) all_linear2_params = torch.cat([x.view(-1) for x in model.linear2.parameters()]) l1_regularization = lambda1 * torch.norm(all_linear1_params, 1) l2_regularization = lambda2 * torch.norm(all_linear2_params, 2) loss = cross_entropy_loss + l1_regularization + l2_regularization loss.backward() optimizer.step()
All of the (other current) responses are incorrect in some way as the question is about adding regularization to activation. This one is closest in that it suggests summing the norms of the outputs, which is correct, but the code sums the norms of the weights, which is incorrect.
The correct way is not to modify the network code, but rather to capture the outputs via a forward hook, as in the OutputHook
class. From there, the summing of the norms of the outputs is straightforward, but one needs to take care to clear the captured outputs every iteration.
import torch class OutputHook(list): """ Hook to capture module outputs. """ def __call__(self, module, input, output): self.append(output) class MLP(torch.nn.Module): def __init__(self): super(MLP, self).__init__() self.linear1 = torch.nn.Linear(128, 32) self.linear2 = torch.nn.Linear(32, 16) self.linear3 = torch.nn.Linear(16, 2) # Instantiate ReLU, so a hook can be registered to capture its output. self.relu = torch.nn.ReLU() def forward(self, x): layer1_out = self.relu(self.linear1(x)) layer2_out = self.relu(self.linear2(layer1_out)) out = self.linear3(layer2_out) return out batch_size = 4 l1_lambda = 0.01 model = MLP() optimizer = torch.optim.SGD(model.parameters(), lr=1e-4) # Register hook to capture the ReLU outputs. Non-trivial networks will often # require hooks to be applied more judiciously. output_hook = OutputHook() model.relu.register_forward_hook(output_hook) inputs = torch.rand(batch_size, 128) targets = torch.ones(batch_size).long() optimizer.zero_grad() outputs = model(inputs) cross_entropy_loss = torch.nn.functional.cross_entropy(outputs, targets) # Compute the L1 penalty over the ReLU outputs captured by the hook. l1_penalty = 0. for output in output_hook: l1_penalty += torch.norm(output, 1) l1_penalty *= l1_lambda loss = cross_entropy_loss + l1_penalty loss.backward() optimizer.step() output_hook.clear()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With