How does LassoCV in scikit-learn partition data?

Tags:

I am performing linear regression using the Lasso method in sklearn.

According to their guidance, and that which I have seen elsewhere, instead of simply conducting cross validation on all of the training data it is advised to split it up into more traditional training set / validation set partitions.

The Lasso is thus trained on the training set and then the hyperparameter alpha is tuned on the basis of results from cross validation of the validation set. Finally, the accepted model is used on the test set to give a realistic view oh how it will perform in reality. Seperating the concerns out here is a preventative measure against overfitting.

Actual Question

Does Lasso CV conform to the above protocol or does it just somehow train the model paramaters and hyperparameters on the same data and/or during the same rounds of CV?

Thanks.

597

asked Jun 15 '14 20:06

Sirrah

1 Answers

If you use sklearn.cross_validation.cross_val_score with a sklearn.linear_model.LassoCV object, then you are performing nested cross-validation. cross_val_score will divide your data into train and test sets according to how you specify the folds (which can be done with objects such as sklearn.cross_validation.KFold). The train set will be passed to the LassoCV, which itself performs another splitting of the data in order to choose the right penalty. This, it seems, corresponds to the setting you are seeking.

import numpy as np
from sklearn.cross_validation import KFold, cross_val_score
from sklearn.linear_model import LassoCV

X = np.random.randn(20, 10)
y = np.random.randn(len(X))

cv_outer = KFold(len(X), n_folds=5)
lasso = LassoCV(cv=3)  # cv=3 makes a KFold inner splitting with 3 folds

scores = cross_val_score(lasso, X, y, cv=cv_outer)

Answer: no, LassoCV will not do all the work for you, and you have to use it in conjunction with cross_val_score to obtain what you want. This is at the same time the reasonable way of implementing such objects, since we can also be interested in only fitting a hyperparameter optimized LassoCV without necessarily evaluating it directly on another set of held out data.

100

answered Sep 27 '22 18:09

eickenberg

Related questions
                            
                                Should I use pygame.event.get() or pygame.event.poll()?
                            
                                Send text "http" over python socket
                            
                                Sympy: working with equalities manually
                            
                                "AttributeError: 'list' object has no attribute 'ravel'"
                            
                                Connect terminal IPython to existing notebook kernel
                            
                                get "1" for a one-dimensional numpy.array using a shape-like function
                            
                                Python Boto: How do you specify a subnet id AND a security group?
                            
                                How to do struct.pack and struct.unpack in cython?
                            
                                python utf-8-sig BOM in the middle of the file when appending to the end
                            
                                numpy: copying value defaults on integer indexing vs boolean indexing
                            
                                Check if xml ElementTree node is None/False [duplicate]
                            
                                Iterating over asyncio.coroutine
                            
                                Linear fit including all errors with NumPy/SciPy
                            
                                Requests not able call multiple routes in same Flask application
                            
                                Clustering 500,000 geospatial points in python
                            
                                Python Kernel crashes after closing an PyQt4 Gui Application
                            
                                Python reference to an new instance alternating
                            
                                Why use the map() function?
                            
                                Is there any way to automatic detect required modules and packages in my own python project
                            
                                Scatter Plot 3D with labels and spheres

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does LassoCV in scikit-learn partition data?

Tags:

python

scikit-learn

regression

cross-validation

Sirrah

People also ask

1 Answers

eickenberg

Recent Activity

Donate For Us