sklearn: use Pipeline in a RandomizedSearchCV?

Tags:

I'd like to be able to use pipelines in the RandomizedSearchCV construct in sklearn. However right now I believe that only estimators are supported. Here's an example of what I'd like to be able to do:

import numpy as np

from sklearn.grid_search import RandomizedSearchCV
from sklearn.datasets import load_digits
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler    
from sklearn.pipeline import Pipeline

# get some data
iris = load_digits()
X, y = iris.data, iris.target

# specify parameters and distributions to sample from
param_dist = {'C': [1, 10, 100, 1000], 
          'gamma': [0.001, 0.0001], 
          'kernel': ['rbf', 'linear'],}

# create pipeline with a scaler 
steps = [('scaler', StandardScaler()), ('rbf_svm', SVC())]
pipeline = Pipeline(steps)

# do search
search = RandomizedSearchCV(pipeline, 
param_distributions=param_dist, n_iter=50)
search.fit(X, y)

print search.grid_scores_

If you just run like this, you'll get the following error:

ValueError: Invalid parameter kernel for estimator Pipeline

Is there a good way to do this in sklearn?

334

asked Jan 27 '15 19:01

lollercoaster

2 Answers

I think this is what you need (section 3).

pipeline.get_params().keys() -> make sure your param grid keys match those returned by this.

answered Sep 19 '22 03:09

dzenilee

RandomizedSearchCV, as well as GridSearchCV, do support pipelines (in fact, they're independent of their implementation, and pipelines are designed to be equivalent to usual classifiers).

The key to the issue is pretty straightforward if you think, what parameters should search be done over. Since pipeline consists of many objects (several transformers + a classifier), one may want to find optimal parameters both for the classifier and transformers. Thus, you need to somehow distinguish where to get / set properties from / to.

So what you need to do is to say that you want to find a value for, say, not just some abstract gamma (which pipeline doesn't have at all), but gamma of pipeline's classifier, which is called in your case rbf_svm (that also justifies the need for names). This can be achieved using double underscore syntax, widely used in sklearn for nested models:

param_dist = {
          'rbf_svm__C': [1, 10, 100, 1000], 
          'rbf_svm__gamma': [0.001, 0.0001], 
          'rbf_svm__kernel': ['rbf', 'linear'],
}

answered Sep 20 '22 03:09

Artem Sobolev

Related questions
                            
                                Convert Python None to JavaScript null
                            
                                MySQLdb for Python 2.7 (Ubuntu)
                            
                                Virtualenv ". venv/bin/activate" vs "source venv/bin/activate"
                            
                                Cross platform interface for virtualenv
                            
                                Time a while loop python
                            
                                Handling with multiple domains in Flask
                            
                                Scrapy: Define items dynamically
                            
                                Why does S3 (using with boto and django-storages) give signed url even for public files?
                            
                                Selenium webdriver and unicode
                            
                                Python/PIL affine transformation
                            
                                Detect key input in Python
                            
                                Django template for loop
                            
                                Resetting the expiration time for a cookie in Flask
                            
                                How to make markers on lines smaller in matplotlib?
                            
                                Python - Conversion of list of arrays to 2D array
                            
                                How to iterate through a module's functions [duplicate]
                            
                                How to filter filter_horizontal in Django admin?
                            
                                whitespace in regular expression
                            
                                PDB: How to inspect local variables of functions in nested stack frames?
                            
                                matplotlib animation movie: quality of movie decreasing with time

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

sklearn: use Pipeline in a RandomizedSearchCV?

Tags:

python

machine-learning

numpy

scikit-learn

lollercoaster

People also ask

2 Answers

dzenilee

Artem Sobolev

Recent Activity

Donate For Us