Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

apply gridsearch CV on a scikit-learn pipeline [[feature selection] + [algorithm]] but it give the following error:

I would like to apply gridsearch CV on a scikit-learn pipeline [[feature selection] + [algorithm]] but it give the following error, how can I correct the code?

 from sklearn import svm
 from sklearn.model_selection import GridSearchCV
 from sklearn.pipeline import Pipeline
 from sklearn.feature_selection import SelectKBest
 from sklearn.feature_selection import SelectFromModel
 pipeline1 = Pipeline([ 
    ('feature_selection', SelectFromModel(svm.SVC(kernel='linear'))),
    ('filter'           , SelectKBest(k=11)),
    ('classification'   , svm.SVC(kernel='linear'))
                ])
 grid_parameters_tune = 
      [{'estimator__C': [0.01, 0.1, 1.0, 10.0, 100.0, 1000.0]}]
 model = GridSearchCV(pipeline1, grid_parameters_tune, cv=5, n_jobs=-1, 
                   verbose=1)
 model.fit(X, y)


ValueError: Invalid parameter estimator for estimator Pipeline(memory=None,
steps=[('feature_union', FeatureUnion(n_jobs=None,
transformer_list=[('filter', SelectKBest(k=10, score_func=<function f_classif at 0x000001ECCBB3E840>)), ('feature_selection', SelectFromModel(estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', ...r', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False))]). Check the list of available parameters with `estimator.get_params().keys()`.
like image 510
Srona Avatar asked Sep 01 '25 01:09

Srona


1 Answers

I think the error comes from the name in your grid_parameters_tune. You are trying to access estimator__C, but there are no steps names estimator in your pipeline. Renaming it classification__C should do the trick.

If you want to access to the C parameter from the SVC in SelectFromModel, you can do so with feature_selection__estimator__C

Below is a working example with random data. I changed some of the parameters from your original pipeline in order to save some time, do not necessarily copy it directly.

import numpy as np
import pandas as pd
from sklearn import svm
from sklearn.feature_selection import SelectFromModel, SelectKBest
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline

X = pd.DataFrame(data=np.arange(1000).reshape(-1, 25))
y = np.random.binomial(1, 0.5, 1000//25)


pipeline1 = Pipeline(
    [
        ("feature_selection", SelectFromModel(svm.SVC(kernel="linear"))),
        ("filter", SelectKBest(k=11)),
        ("classification", svm.SVC(kernel="linear")),
    ]
)
grid_parameters_tune = [{"classification__C": [0.01, 0.1, 1.0, 10.0,]}]
model = GridSearchCV(pipeline1, grid_parameters_tune, cv=3, n_jobs=-1, verbose=1)
model.fit(X, y)

As for the second way:

pipeline1 = Pipeline(
    [
        ("feature_selection", SelectFromModel(svm.SVC(kernel="linear"))),
        ("filter", SelectKBest(k=11)),
        ("classification", svm.SVC(kernel="linear")),
    ]
)
grid_parameters_tune = [{"feature_selection__estimator__C": [0.01, 0.1, 1.0, 10.0,]}]
model = GridSearchCV(pipeline1, grid_parameters_tune, cv=3, n_jobs=-1, verbose=1)
model.fit(X, y)
like image 131
FlorianGD Avatar answered Sep 02 '25 15:09

FlorianGD