When searching for ways to implement L1 regularization in PyTorch Models, I came across this question, which is now 2 years old so i was wondering if theres anything new on this topic?
I also found this recent approach of dealing with the missing l1 function. However I don't understand how to use it for a basic NN as shown below.
class FFNNModel(nn.Module):
def __init__(self, input_dim, output_dim, hidden_dim, dropout_rate):
super(FFNNModel, self).__init__()
self.input_dim = input_dim
self.output_dim = output_dim
self.hidden_dim = hidden_dim
self.dropout_rate = dropout_rate
self.drop_layer = nn.Dropout(p=self.dropout_rate)
self.fully = nn.ModuleList()
current_dim = input_dim
for h_dim in hidden_dim:
self.fully.append(nn.Linear(current_dim, h_dim))
current_dim = h_dim
self.fully.append(nn.Linear(current_dim, output_dim))
def forward(self, x):
for layer in self.fully[:-1]:
x = self.drop_layer(F.relu(layer(x)))
x = F.softmax(self.fully[-1](x), dim=0)
return x
I was hoping simply putting this before training would work:
model = FFNNModel(30,5,[100,200,300,100],0.2)
regularizer = _Regularizer(model)
regularizer = L1Regularizer(regularizer, lambda_reg=0.1)
with
out = model(inputs)
loss = criterion(out, target) + regularizer.__add_l1()
Does anyone understand how to apply these 'ready to use' classes?
I haven't run the code in question, so please reach back if something doesn't exactly work. Generally, I would say that the code you linked is needlessly complicated (it may be because it tries to be generic and allow all the following kinds of regularization). The way it is to be used is, I suppose
model = FFNNModel(30,5,[100,200,300,100],0.2)
regularizer = L1Regularizer(model, lambda_reg=0.1)
and then
out = model(inputs)
loss = criterion(out, target) + regularizer.regularized_all_param(0.)
You can check that regularized_all_param will just iterate over parameters of your model and if their name ends with weight, it will accumulate their sum of absolute values. For some reason the buffer is to be manually initialized, that's why we pass in the 0..
Really though, if you wish to efficiently regularize L1 and don't need any bells and whistles, the more manual approach, akin to your first link, will be more readable. It would go like this
l1_regularization = 0.
for param in model.parameters():
l1_regularization += param.abs().sum()
loss = criterion(out, target) + l1_regularization
This is really what is at heart of both approaches. You use the Module.parameters method to iterate over all model parameters and you sum up their L1 norms, which then becomes a term in your loss function. That's it. The repo you linked comes up with some fancy machinery to abstract it away but, judging by your question, fails :)
SIMPLE SOLUTION for anyone stumbling over this:
There were always some issues with the Regularizer_ classes in the link above so i solved the issue using regular functions, adding an orthogonal regularizer as well:
def l1_regularizer(model, lambda_l1=0.01):
lossl1 = 0
for model_param_name, model_param_value in model.named_parameters():
if model_param_name.endswith('weight'):
lossl1 += lambda_l1 * model_param_value.abs().sum()
return lossl1
def orth_regularizer(model, lambda_orth=0.01):
lossorth = 0
for model_param_name, model_param_value in model.named_parameters():
if model_param_name.endswith('weight'):
param_flat = model_param_value.view(model_param_value.shape[0], -1)
sym = torch.mm(param_flat, torch.t(param_flat))
sym -= torch.eye(param_flat.shape[0])
lossorth += lambda_orth * sym.sum()
return lossorth
and during training do:
loss = criterion(outputs, y_data)\
+l1_regularizer(model, lambda_l1=lambda_l1)\
+orth_regularizer(model, lambda_orth=lambda_orth)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With