Invalid parameter for sklearn estimator pipeline

Question

I am implementing an example from the O'Reilly book "Introduction to Machine Learning with Python", using Python 2.7 and sklearn 0.16.

The code I am using:

pipe = make_pipeline(TfidfVectorizer(), LogisticRegression())
param_grid = {"logisticregression_C": [0.001, 0.01, 0.1, 1, 10, 100], "tfidfvectorizer_ngram_range": [(1,1), (1,2), (1,3)]}
grid = GridSearchCV(pipe, param_grid, cv=5)
grid.fit(X_train, y_train)
print("Best cross-validation score: {:.2f}".format(grid.best_score_))

The error being returned boils down to:

ValueError: Invalid parameter logisticregression_C for estimator Pipeline

Is this an error related to using Make_pipeline from v.0.16? What is causing this error?

Vivek Kumar · Accepted Answer

There should be two underscores between estimator name and it's parameters in a Pipeline logisticregression__C. Do the same for tfidfvectorizer

See the example at http://scikit-learn.org/stable/auto_examples/plot_compare_reduction.html#sphx-glr-auto-examples-plot-compare-reduction-py

Bex T. · Answer

For a more general answer to using Pipeline in a GridSearchCV, the parameter grid for the model should start with whatever name you gave when defining the pipeline. For example:

# Pay attention to the name of the second step, i. e. 'model'
pipeline = Pipeline(steps=[
     ('preprocess', preprocess),
     ('model', Lasso())
])

# Define the parameter grid to be used in GridSearch
param_grid = {'model__alpha': np.arange(0, 1, 0.05)}

search = GridSearchCV(pipeline, param_grid)
search.fit(X_train, y_train)

In the pipeline, we used the name model for the estimator step. So, in the grid search, any hyperparameter for Lasso regression should be given with the prefix model__. The parameters in the grid depends on what name you gave in the pipeline. In plain-old GridSearchCV without a pipeline, the grid would be given like this:

param_grid = {'alpha': np.arange(0, 1, 0.05)}
search = GridSearchCV(Lasso(), param_grid)

You can find out more about GridSearch from this post.

Eric Wiener · Answer

Note that if you are using a pipeline with a voting classifier and a column selector, you will need multiple layers of names:

pipe1 = make_pipeline(ColumnSelector(cols=(0, 1)),
                      LogisticRegression())
pipe2 = make_pipeline(ColumnSelector(cols=(1, 2, 3)),
                      SVC())
votingClassifier = VotingClassifier(estimators=[
        ('p1', pipe1), ('p2', pipe2)])

You will need a param grid that looks like the following:

param_grid = { 
        'p2__svc__kernel': ['rbf', 'poly'],
        'p2__svc__gamma': ['scale', 'auto'],
    }

p2 is the name of the pipe and svc is the default name of the classifier you create in that pipe. The third element is the parameter you want to modify.

Invalid parameter for sklearn estimator pipeline

Tags:

python

scikit-learn

grid-search

scikit-learn-pipeline

sudo_coffee

3 Answers

Vivek Kumar

Bex T.

Eric Wiener

Recent Activity

Donate For Us

Invalid parameter for sklearn estimator pipeline

Tags:

python

scikit-learn

grid-search

scikit-learn-pipeline

sudo_coffee

3 Answers

Vivek Kumar

Bex T.

Eric Wiener

Related questions

Recent Activity

Donate For Us