Below is my pipeline and it seems that I can't pass the parameters to my models by using the ModelTransformer class, which I take it from the link (http://zacstewart.com/2014/08/05/pipelines-of-featureunions-of-pipelines.html)
The error message makes sense to me, but I don't know how to fix this. Any idea how to fix this? Thanks.
# define a pipeline
pipeline = Pipeline([
('vect', DictVectorizer(sparse=False)),
('scale', preprocessing.MinMaxScaler()),
('ess', FeatureUnion(n_jobs=-1,
transformer_list=[
('rfc', ModelTransformer(RandomForestClassifier(n_jobs=-1, random_state=1, n_estimators=100))),
('svc', ModelTransformer(SVC(random_state=1))),],
transformer_weights=None)),
('es', EnsembleClassifier1()),
])
# define the parameters for the pipeline
parameters = {
'ess__rfc__n_estimators': (100, 200),
}
# ModelTransformer class. It takes it from the link
(http://zacstewart.com/2014/08/05/pipelines-of-featureunions-of-pipelines.html)
class ModelTransformer(TransformerMixin):
def __init__(self, model):
self.model = model
def fit(self, *args, **kwargs):
self.model.fit(*args, **kwargs)
return self
def transform(self, X, **transform_params):
return DataFrame(self.model.predict(X))
grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, refit=True)
Error Message: ValueError: Invalid parameter n_estimators for estimator ModelTransformer.
Cross-Validation and GridSearchCV Cross-Validation is used while training the model. As we know that before training the model with data, we divide the data into two parts – train data and test data. In cross-validation, the process divides the train data further into two parts – the train data and the validation data.
cv: number of cross-validation you have to try for each selected set of hyperparameters. verbose: you can set it to 1 to get the detailed print out while you fit the data to GridSearchCV.
GridSearchCV tries all the combinations of the values passed in the dictionary and evaluates the model for each combination using the Cross-Validation method. Hence after using this function we get accuracy/loss for every combination of hyperparameters and we can choose the one with the best performance.
GridSearchCV
has a special naming convention for nested objects. In your case ess__rfc__n_estimators
stands for ess.rfc.n_estimators
, and, according to the definition of the pipeline
, it points to the property n_estimators
of
ModelTransformer(RandomForestClassifier(n_jobs=-1, random_state=1, n_estimators=100)))
Obviously, ModelTransformer
instances don't have such property.
The fix is easy: in order to access underlying object of ModelTransformer
one needs to use model
field. So, grid parameters become
parameters = {
'ess__rfc__model__n_estimators': (100, 200),
}
P.S. it's not the only problem with your code. In order to use multiple jobs in GridSearchCV, you need to make all objects you're using copy-able. This is achieved by implementing methods get_params
and set_params
, you can borrow them from BaseEstimator
mixin.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With