Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lack of Sparse Solution with L1 Regularization in Pytorch

I am trying to implement L1 regularization onto the first layer of a simple neural network (1 hidden layer). I looked into some other posts on StackOverflow that apply l1 regularization using Pytorch to figure out how it should be done (references: Adding L1/L2 regularization in PyTorch?, In Pytorch, how to add L1 regularizer to activations?). No matter how high I increase lambda (the l1 regularization strength parameter) I do not get true zeros in the first weight matrix. Why would this be? (Code is below)

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

class Network(nn.Module):
    def __init__(self,nf,nh,nc):
        super(Network,self).__init__()
        self.lin1=nn.Linear(nf,nh)
        self.lin2=nn.Linear(nh,nc)

    def forward(self,x):
        l1out=F.relu(self.lin1(x))
        out=F.softmax(self.lin2(l1out))
        return out, l1out

def l1loss(layer):
    return torch.norm(layer.weight.data, p=1)

nf=10
nc=2
nh=6
learningrate=0.02
lmbda=10.
batchsize=50

net=Network(nf,nh,nc)

crit=nn.MSELoss()
optimizer=torch.optim.Adagrad(net.parameters(),lr=learningrate)


xtr=torch.Tensor(xtr)
ytr=torch.Tensor(ytr)
#ytr=torch.LongTensor(ytr)
xte=torch.Tensor(xte)
yte=torch.LongTensor(yte)
#cyte=torch.Tensor(yte)

it=200
for epoch in range(it):
    per=torch.randperm(len(xtr))
    for i in range(0,len(xtr),batchsize):
        ind=per[i:i+batchsize]
        bx,by=xtr[ind],ytr[ind]            
        optimizer.zero_grad()
        output, l1out=net(bx)
#        l1reg=l1loss(net.lin1)    
        loss=crit(output,by)+lmbda*l1loss(net.lin1)
        loss.backward()
        optimizer.step()
    print('Epoch [%i/%i], Loss: %.4f' %(epoch+1,it, np.float32(loss.data.numpy())))

corr=0
tot=0
for x,y in list(zip(xte,yte)):
    output,_=net(x)
    _,pred=torch.max(output,-1)
    tot+=1 #y.size(0)
    corr+=(pred==y).sum()
print(corr)

Note: The data has 10 features (2 classes and 800 training samples) and only the first 2 are relevant (by design) so one would assume true zeros should be easy enough to learn.

like image 890
cyradil Avatar asked Apr 27 '18 01:04

cyradil


People also ask

Can L1 regularization result in sparse models?

The black circle in all the contours represents the one which interesects the L1 Norm or Lasso. It intersects relatively close to axes. This results in making coefficients to 0 and hence feature selection. Hence L1 norm make the model sparse.

What is L1 regularization Why does it result in sparse solution?

Reason for sparsity L1 regularization causes coefficients to converge to 0 rather quickly since the constraint bounds all weight vectors to lie within the L1 norm. The rate of convergence is higher for L1 due to the first derivative of loss being simply λ for L1 whereas being 2 λ 2\lambda 2λ for L2.

Why does L1 norm enforce sparsity?

The reason for using L1 norm to find a sparse solution is due to its special shape. It has spikes that happen to be at sparse points. Using it to touch the solution surface will very likely to find a touch point on a spike tip and thus a sparse solution.

Which regularization makes parameters more sparse?

L1 regularization is the preferred choice when having a high number of features as it provides sparse solutions. Even, we obtain the computational advantage because features with zero coefficients can be avoided. The regression model that uses L1 regularization technique is called Lasso Regression.


1 Answers

Your usage of layer.weight.data removes the parameter (which is a PyTorch variable) from its automatic differentiation context, making it a constant when the optimiser takes the gradients. This results in zero gradients and that the L1 loss is not computed.

If you remove the .data, the norm is computed of the PyTorch variable and the gradients should be correct.

For more information on PyTorch's automatic differentiation mechanics, see this docs article or this tutorial.

like image 155
Pim Avatar answered Oct 24 '22 14:10

Pim