I would like to apply gridsearch CV on a scikit-learn pipeline [[feature selection] + [algorithm]] but it give the following error, how can I correct the code?
 from sklearn import svm
 from sklearn.model_selection import GridSearchCV
 from sklearn.pipeline import Pipeline
 from sklearn.feature_selection import SelectKBest
 from sklearn.feature_selection import SelectFromModel
 pipeline1 = Pipeline([ 
    ('feature_selection', SelectFromModel(svm.SVC(kernel='linear'))),
    ('filter'           , SelectKBest(k=11)),
    ('classification'   , svm.SVC(kernel='linear'))
                ])
 grid_parameters_tune = 
      [{'estimator__C': [0.01, 0.1, 1.0, 10.0, 100.0, 1000.0]}]
 model = GridSearchCV(pipeline1, grid_parameters_tune, cv=5, n_jobs=-1, 
                   verbose=1)
 model.fit(X, y)
ValueError: Invalid parameter estimator for estimator Pipeline(memory=None,
steps=[('feature_union', FeatureUnion(n_jobs=None,
transformer_list=[('filter', SelectKBest(k=10, score_func=<function f_classif at 0x000001ECCBB3E840>)), ('feature_selection', SelectFromModel(estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', ...r', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False))]). Check the list of available parameters with `estimator.get_params().keys()`.
I think the error comes from the name in your grid_parameters_tune. You are trying to access estimator__C, but there are no steps names estimator in your pipeline. Renaming it classification__C should do the trick.
If you want to access to the C parameter from the SVC in SelectFromModel, you can do so with feature_selection__estimator__C
Below is a working example with random data. I changed some of the parameters from your original pipeline in order to save some time, do not necessarily copy it directly.
import numpy as np
import pandas as pd
from sklearn import svm
from sklearn.feature_selection import SelectFromModel, SelectKBest
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
X = pd.DataFrame(data=np.arange(1000).reshape(-1, 25))
y = np.random.binomial(1, 0.5, 1000//25)
pipeline1 = Pipeline(
    [
        ("feature_selection", SelectFromModel(svm.SVC(kernel="linear"))),
        ("filter", SelectKBest(k=11)),
        ("classification", svm.SVC(kernel="linear")),
    ]
)
grid_parameters_tune = [{"classification__C": [0.01, 0.1, 1.0, 10.0,]}]
model = GridSearchCV(pipeline1, grid_parameters_tune, cv=3, n_jobs=-1, verbose=1)
model.fit(X, y)
As for the second way:
pipeline1 = Pipeline(
    [
        ("feature_selection", SelectFromModel(svm.SVC(kernel="linear"))),
        ("filter", SelectKBest(k=11)),
        ("classification", svm.SVC(kernel="linear")),
    ]
)
grid_parameters_tune = [{"feature_selection__estimator__C": [0.01, 0.1, 1.0, 10.0,]}]
model = GridSearchCV(pipeline1, grid_parameters_tune, cv=3, n_jobs=-1, verbose=1)
model.fit(X, y)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With