Lack of Sparse Solution with L1 Regularization in Pytorch

Tags:

I am trying to implement L1 regularization onto the first layer of a simple neural network (1 hidden layer). I looked into some other posts on StackOverflow that apply l1 regularization using Pytorch to figure out how it should be done (references: Adding L1/L2 regularization in PyTorch?, In Pytorch, how to add L1 regularizer to activations?). No matter how high I increase lambda (the l1 regularization strength parameter) I do not get true zeros in the first weight matrix. Why would this be? (Code is below)

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

class Network(nn.Module):
    def __init__(self,nf,nh,nc):
        super(Network,self).__init__()
        self.lin1=nn.Linear(nf,nh)
        self.lin2=nn.Linear(nh,nc)

    def forward(self,x):
        l1out=F.relu(self.lin1(x))
        out=F.softmax(self.lin2(l1out))
        return out, l1out

def l1loss(layer):
    return torch.norm(layer.weight.data, p=1)

nf=10
nc=2
nh=6
learningrate=0.02
lmbda=10.
batchsize=50

net=Network(nf,nh,nc)

crit=nn.MSELoss()
optimizer=torch.optim.Adagrad(net.parameters(),lr=learningrate)


xtr=torch.Tensor(xtr)
ytr=torch.Tensor(ytr)
#ytr=torch.LongTensor(ytr)
xte=torch.Tensor(xte)
yte=torch.LongTensor(yte)
#cyte=torch.Tensor(yte)

it=200
for epoch in range(it):
    per=torch.randperm(len(xtr))
    for i in range(0,len(xtr),batchsize):
        ind=per[i:i+batchsize]
        bx,by=xtr[ind],ytr[ind]            
        optimizer.zero_grad()
        output, l1out=net(bx)
#        l1reg=l1loss(net.lin1)    
        loss=crit(output,by)+lmbda*l1loss(net.lin1)
        loss.backward()
        optimizer.step()
    print('Epoch [%i/%i], Loss: %.4f' %(epoch+1,it, np.float32(loss.data.numpy())))

corr=0
tot=0
for x,y in list(zip(xte,yte)):
    output,_=net(x)
    _,pred=torch.max(output,-1)
    tot+=1 #y.size(0)
    corr+=(pred==y).sum()
print(corr)

Note: The data has 10 features (2 classes and 800 training samples) and only the first 2 are relevant (by design) so one would assume true zeros should be easy enough to learn.

890

asked Apr 27 '18 01:04

cyradil

1 Answers

Your usage of layer.weight.data removes the parameter (which is a PyTorch variable) from its automatic differentiation context, making it a constant when the optimiser takes the gradients. This results in zero gradients and that the L1 loss is not computed.

If you remove the .data, the norm is computed of the PyTorch variable and the gradients should be correct.

For more information on PyTorch's automatic differentiation mechanics, see this docs article or this tutorial.

155

answered Oct 24 '22 14:10

Pim

Related questions
                            
                                Python program outputting different results, even though no random is used
                            
                                Extracting Prices with Regex
                            
                                Keras model output information/log level
                            
                                Moviepy - Output video not playable
                            
                                Python PCA plot using Hotelling's T2 for a confidence interval
                            
                                Getting Labels on top of Bar in Polar/Radial Bar Chart in Matplotlib, Python3
                            
                                Sqlalchemy get row in timeslot
                            
                                How to "normalize" python 3 unicode string
                            
                                Scraping Infinite Scrolling Pages with "load more" button using Scrapy
                            
                                Why is Python3 much slower than Python2 on my task?
                            
                                Quick Rest API with Python for mocking responses
                            
                                AttributeError: 'Series' object has no attribute 'notna'
                            
                                how to handle subprocess.Popen output in both Python 2 and Python 3
                            
                                How to tell if a python module is intended to be python 2 or python 3?
                            
                                control seaborn regplot confidence intervals translucency
                            
                                mypy importlib module functions
                            
                                Can mypy handle list comprehensions?
                            
                                'Series' object has no attribute 'applymap'
                            
                                RTLD_GLOBAL and Two Level Namespaces on macOS
                            
                                Call __exit__ on all members of a class

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Lack of Sparse Solution with L1 Regularization in Pytorch

Tags:

python-3.x

neural-network

pytorch

regularized

cyradil

People also ask

1 Answers

Pim

Recent Activity

Donate For Us