I am using <code>LaasoCV</code> from <code>sklearn</code> to select the best model is selected by cross-validation. I found that the cross validation gives different result if I use sklearn or matlab statistical toolbox. I used <code>matlab</code> and replicate the example given in http://www.mathworks.se/help/stats/lasso-and-elastic-net.html to get a figure like this <img src="https://i.stack.imgur.com/Xu3vk.png" alt="enter image description here"> Then I saved the <code>matlab</code> data, and tried to replicate the figure with <code>laaso_path</code> from <code>sklearn</code>, I got <img src="https://i.stack.imgur.com/rME6q.png" alt="enter image description here"> Although there are some similarity between these two figures, there are also certain differences. As far as I understand parameter <code>lambda</code> in <code>matlab</code> and <code>alpha</code> in <code>sklearn</code> are same, however in this figure it seems that there are some differences. Can somebody point out which is the correct one or am I missing something? Further the coefficient obtained are also different (which is my main concern). Matlab Code: <pre class="prettyprint"><code>rng(3,'twister') % for reproducibility X = zeros(200,5); for ii = 1:5 X(:,ii) = exprnd(ii,200,1); end r = [0;2;0;-3;0]; Y = X*r + randn(200,1)*.1; save randomData.mat % To be used in python code [b fitinfo] = lasso(X,Y,'cv',10); lassoPlot(b,fitinfo,'plottype','lambda','xscale','log'); disp('Lambda with min MSE') fitinfo.LambdaMinMSE disp('Lambda with 1SE') fitinfo.Lambda1SE disp('Quality of Fit') lambdaindex = fitinfo.Index1SE; fitinfo.MSE(lambdaindex) disp('Number of non zero predictos') fitinfo.DF(lambdaindex) disp('Coefficient of fit at that lambda') b(:,lambdaindex) </code></pre> Python Code: <pre class="prettyprint"><code>import scipy.io import numpy as np import pylab as pl from sklearn.linear_model import lasso_path, LassoCV data=scipy.io.loadmat('randomData.mat') X=data['X'] Y=data['Y'].flatten() model = LassoCV(cv=10,max_iter=1000).fit(X, Y) print 'alpha', model.alpha_ print 'coef', model.coef_ eps = 1e-2 # the smaller it is the longer is the path models = lasso_path(X, Y, eps=eps) alphas_lasso = np.array([model.alpha for model in models]) coefs_lasso = np.array([model.coef_ for model in models]) pl.figure(1) ax = pl.gca() ax.set_color_cycle(2 * ['b', 'r', 'g', 'c', 'k']) l1 = pl.semilogx(alphas_lasso,coefs_lasso) pl.gca().invert_xaxis() pl.xlabel('alpha') pl.show() </code></pre>

I do not have matlab but be careful that the value obtained with the cross--validation can be unstable. This is because it influenced by the way you subdivide the samples. Even if you run 2 times the cross-validation in python you can obtain 2 different results. consider this example : <pre class="prettyprint"><code>kf=sklearn.cross_validation.KFold(len(y),n_folds=10,shuffle=True) cv=sklearn.linear_model.LassoCV(cv=kf,normalize=True).fit(x,y) print cv.alpha_ kf=sklearn.cross_validation.KFold(len(y),n_folds=10,shuffle=True) cv=sklearn.linear_model.LassoCV(cv=kf,normalize=True).fit(x,y) print cv.alpha_ 0.00645093258722 0.00691712356467 </code></pre>

Why are LASSO in sklearn (python) and matlab statistical package different?

Tags:

python

statistics

matlab

scikit-learn

linear-regression

I am using LaasoCV from sklearn to select the best model is selected by cross-validation. I found that the cross validation gives different result if I use sklearn or matlab statistical toolbox.

I used matlab and replicate the example given in http://www.mathworks.se/help/stats/lasso-and-elastic-net.html to get a figure like this

enter image description here

Then I saved the matlab data, and tried to replicate the figure with laaso_path from sklearn, I got

enter image description here

Although there are some similarity between these two figures, there are also certain differences. As far as I understand parameter lambda in matlab and alpha in sklearn are same, however in this figure it seems that there are some differences. Can somebody point out which is the correct one or am I missing something? Further the coefficient obtained are also different (which is my main concern).

Matlab Code:

Click to copy

rng(3,'twister') % for reproducibility
X = zeros(200,5);
for ii = 1:5
      X(:,ii) = exprnd(ii,200,1);
end
r = [0;2;0;-3;0];
Y = X*r + randn(200,1)*.1;

save randomData.mat % To be used in python code

[b fitinfo] = lasso(X,Y,'cv',10);
lassoPlot(b,fitinfo,'plottype','lambda','xscale','log');

disp('Lambda with min MSE')
fitinfo.LambdaMinMSE
disp('Lambda with 1SE')
fitinfo.Lambda1SE
disp('Quality of Fit')
lambdaindex = fitinfo.Index1SE;
fitinfo.MSE(lambdaindex)
disp('Number of non zero predictos')
fitinfo.DF(lambdaindex)
disp('Coefficient of fit at that lambda')
b(:,lambdaindex)

Python Code:

Click to copy

import scipy.io
import numpy as np
import pylab as pl
from sklearn.linear_model import lasso_path, LassoCV

data=scipy.io.loadmat('randomData.mat')
X=data['X']
Y=data['Y'].flatten()

model = LassoCV(cv=10,max_iter=1000).fit(X, Y)
print 'alpha', model.alpha_
print 'coef', model.coef_

eps = 1e-2 # the smaller it is the longer is the path
models = lasso_path(X, Y, eps=eps)
alphas_lasso = np.array([model.alpha for model in models])
coefs_lasso = np.array([model.coef_ for model in models])

pl.figure(1)
ax = pl.gca()
ax.set_color_cycle(2 * ['b', 'r', 'g', 'c', 'k'])
l1 = pl.semilogx(alphas_lasso,coefs_lasso)
pl.gca().invert_xaxis()
pl.xlabel('alpha')
pl.show()

363

asked Oct 05 '12 12:10

imsc

1 Answers

I do not have matlab but be careful that the value obtained with the cross--validation can be unstable. This is because it influenced by the way you subdivide the samples.

Even if you run 2 times the cross-validation in python you can obtain 2 different results. consider this example :

Click to copy

kf=sklearn.cross_validation.KFold(len(y),n_folds=10,shuffle=True)
cv=sklearn.linear_model.LassoCV(cv=kf,normalize=True).fit(x,y)
print cv.alpha_
kf=sklearn.cross_validation.KFold(len(y),n_folds=10,shuffle=True)
cv=sklearn.linear_model.LassoCV(cv=kf,normalize=True).fit(x,y)
print cv.alpha_

0.00645093258722
0.00691712356467

133

answered Sep 21 '22 08:09

Donbeo

Related questions
                            
                                Is it a good idea to use Python SQLAlchemy in AWS Lambda?
                            
                                Can't debug Django unit tests within Visual Studio Code
                            
                                Python soap using soaplib (server) and suds (client)
                            
                                Empty list in App Engine Datastore: Java vs Python
                            
                                How can I listen for a mouse event in Python on Mac?
                            
                                Remote coding and execution with python: what IDE?
                            
                                Is there a script to manage/search python snippets which understands python code like nullege.com?
                            
                                Can some explain this strange behavior of the hypergeometric distribution in scipy?
                            
                                Best way to support multi-login on AppEngine
                            
                                Test if a celery task is still being processed
                            
                                Powershell integration with Python (not IronPython)
                            
                                How to avoid Python fileinput buffering [duplicate]
                            
                                Writing PowerShell CmdLets in Python Dynamically
                            
                                z-axis formatting in mplot3d
                            
                                os.exec on Windows
                            
                                Audio/Video live streaming between two browsers, which technologies? [closed]
                            
                                Image registration using python and cross-correlation
                            
                                How to set up remote development in Eclipse for Python? (the "remote" part)
                            
                                Why can this unbound variable work in Python (pyquery)?
                            
                                How to build just one file with Sphinx

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why are LASSO in sklearn (python) and matlab statistical package different?

Tags:

python

statistics

matlab

scikit-learn

linear-regression

imsc

People also ask

1 Answers

Donbeo

Recent Activity

Donate For Us