I'm new to <code>sklearn</code>'s <code>Pipeline</code> and <code>GridSearchCV</code> features. I am trying to build a pipeline which first does RandomizedPCA on my training data and then fits a ridge regression model. Here is my code: <pre class="prettyprint"><code>pca = RandomizedPCA(1000, whiten=True) rgn = Ridge() pca_ridge = Pipeline([('pca', pca), ('ridge', rgn)]) parameters = {'ridge__alpha': 10 ** np.linspace(-5, -2, 3)} grid_search = GridSearchCV(pca_ridge, parameters, cv=2, n_jobs=1, scoring='mean_squared_error') grid_search.fit(train_x, train_y[:, 1:]) </code></pre> I know about the <code>RidgeCV</code> function but I want to try out Pipeline and GridSearch CV. I want the grid search CV to report RMSE error, but this doesn't seem supported in sklearn so I'm making do with MSE. However, the scores it resports are negative: <pre class="prettyprint"><code>In [41]: grid_search.grid_scores_ Out[41]: [mean: -0.02665, std: 0.00007, params: {'ridge__alpha': 1.0000000000000001e-05}, mean: -0.02658, std: 0.00009, params: {'ridge__alpha': 0.031622776601683791}, mean: -0.02626, std: 0.00008, params: {'ridge__alpha': 100.0}] </code></pre> Obviously this isn't possible for mean squared error - what am I doing wrong here?

Those scores are negative MSE scores, i.e. negate them and you get the MSE. The thing is that <code>GridSearchCV</code>, by convention, always tries to maximize its score so loss functions like MSE have to be negated.

sklearn GridSearchCV with Pipeline

Tags:

python

scikit-learn

I'm new to sklearn's Pipeline and GridSearchCV features. I am trying to build a pipeline which first does RandomizedPCA on my training data and then fits a ridge regression model. Here is my code:

pca = RandomizedPCA(1000, whiten=True)
rgn = Ridge()

pca_ridge = Pipeline([('pca', pca),
                      ('ridge', rgn)])

parameters = {'ridge__alpha': 10 ** np.linspace(-5, -2, 3)}

grid_search = GridSearchCV(pca_ridge, parameters, cv=2, n_jobs=1, scoring='mean_squared_error')
grid_search.fit(train_x, train_y[:, 1:])

I know about the RidgeCV function but I want to try out Pipeline and GridSearch CV.

I want the grid search CV to report RMSE error, but this doesn't seem supported in sklearn so I'm making do with MSE. However, the scores it resports are negative:

In [41]: grid_search.grid_scores_
Out[41]: 
[mean: -0.02665, std: 0.00007, params: {'ridge__alpha': 1.0000000000000001e-05},
 mean: -0.02658, std: 0.00009, params: {'ridge__alpha': 0.031622776601683791},
 mean: -0.02626, std: 0.00008, params: {'ridge__alpha': 100.0}]

Obviously this isn't possible for mean squared error - what am I doing wrong here?

452

asked Jan 10 '14 16:01

mchangun

2 Answers

Those scores are negative MSE scores, i.e. negate them and you get the MSE. The thing is that GridSearchCV, by convention, always tries to maximize its score so loss functions like MSE have to be negated.

answered Oct 13 '22 12:10

Fred Foo

An alternate way to create GridSearchCV is to use make_scorer and turn greater_is_better flag to False

So, if clf is your classifier, and parameters are your hyperparameter lists, you can use the make_scorer like this:

from sklearn.metrics import make_scorer
#define your own mse and set greater_is_better=False
mse = make_scorer(mean_squared_error,greater_is_better=False)

Now, same as below, you can call the GridSearch and pass your defined mse

grid_obj = GridSearchCV(clf, parameters, cv=5,scoring=mse,n_jobs = -1, verbose=True)

answered Oct 13 '22 12:10

Espanta

Related questions
                            
                                Python C extension: Use extension PYD or DLL?
                            
                                python time offset
                            
                                UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026'
                            
                                Matplotlib histogram with collection bin for high values
                            
                                How to import all the environment variables in tox
                            
                                Bokeh: save plot (as HTML) but don't show it
                            
                                pip install requests[security] vs pip install requests: Difference
                            
                                What are the pitfalls of using Dill to serialise scikit-learn/statsmodels models?
                            
                                How do I use os.scandir() to return DirEntry objects recursively on a directory tree?
                            
                                Click Command Line Interfaces: Make options required if other optional option is unset
                            
                                How to quote a string value explicitly (Python DB API/Psycopg2)
                            
                                Accessing a Python traceback from the C API
                            
                                Class decorators vs function decorators [duplicate]
                            
                                Update new Django and Python 2.7.* with virtualenv on Dreamhost (with passenger)
                            
                                Storing lambdas in a dictionary
                            
                                Python global keyword vs. Pylint W0603
                            
                                Returning JSON array from a Django view to a template
                            
                                Setting selenium to use custom profile, but it keeps opening with default
                            
                                Submitting to a web form using python
                            
                                How to convert a Numpy 2D array with object dtype to a regular 2D array of floats

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With