(Python - sklearn) How to pass parameters to the customize ModelTransformer class by gridsearchcv

Tags:

Below is my pipeline and it seems that I can't pass the parameters to my models by using the ModelTransformer class, which I take it from the link (http://zacstewart.com/2014/08/05/pipelines-of-featureunions-of-pipelines.html)

The error message makes sense to me, but I don't know how to fix this. Any idea how to fix this? Thanks.

# define a pipeline
pipeline = Pipeline([
('vect', DictVectorizer(sparse=False)),
('scale', preprocessing.MinMaxScaler()),
('ess', FeatureUnion(n_jobs=-1, 
                     transformer_list=[
     ('rfc', ModelTransformer(RandomForestClassifier(n_jobs=-1, random_state=1,  n_estimators=100))),
     ('svc', ModelTransformer(SVC(random_state=1))),],
                     transformer_weights=None)),
('es', EnsembleClassifier1()),
])

# define the parameters for the pipeline
parameters = {
'ess__rfc__n_estimators': (100, 200),
}

# ModelTransformer class. It takes it from the link
(http://zacstewart.com/2014/08/05/pipelines-of-featureunions-of-pipelines.html)
class ModelTransformer(TransformerMixin):
    def __init__(self, model):
        self.model = model
    def fit(self, *args, **kwargs):
        self.model.fit(*args, **kwargs)
        return self
    def transform(self, X, **transform_params):
        return DataFrame(self.model.predict(X))

grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, refit=True)

Error Message: ValueError: Invalid parameter n_estimators for estimator ModelTransformer.

293

asked Jan 07 '15 03:01

nkhuyu

1 Answers

GridSearchCV has a special naming convention for nested objects. In your case ess__rfc__n_estimators stands for ess.rfc.n_estimators, and, according to the definition of the pipeline, it points to the property n_estimators of

ModelTransformer(RandomForestClassifier(n_jobs=-1, random_state=1,  n_estimators=100)))

Obviously, ModelTransformer instances don't have such property.

The fix is easy: in order to access underlying object of ModelTransformer one needs to use model field. So, grid parameters become

parameters = {
  'ess__rfc__model__n_estimators': (100, 200),
}

P.S. it's not the only problem with your code. In order to use multiple jobs in GridSearchCV, you need to make all objects you're using copy-able. This is achieved by implementing methods get_params and set_params, you can borrow them from BaseEstimator mixin.

163

answered Sep 29 '22 06:09

Artem Sobolev

Related questions
                            
                                Python how to plot graph sine wave
                            
                                Building numpy with ATLAS/LAPACK support
                            
                                Why is pip, inside a virtualenv, writing to /usr/lib?
                            
                                Python works in PyCharm but not from terminal
                            
                                Create a canonical "parent" product in Django Oscar programmatically
                            
                                Does the Python range generator generate all values or yields them progressively? [duplicate]
                            
                                Why does Python optimize out "if 0", but not "if None"?
                            
                                Simple get/post request blocked in python 3 but not in python 2
                            
                                Error: "MSVCP90.dll: No such file or directory" even though Microsoft Visual C++ 2008 Redistributable Package is installed
                            
                                What is the difference between partial fit and warm start?
                            
                                Difference in package importing between Python 2.7 and 3.4
                            
                                Pip freeze for packages installed with --target
                            
                                Psycopg2 Python SSL Support is not compiled in
                            
                                Multiple columns with the same name in Pandas
                            
                                Django admin add custom filter
                            
                                Python heapify() time complexity
                            
                                'frozenset' object is not callable
                            
                                ImportError: No module named extern
                            
                                Why is reading multiple files at the same time slower than reading sequentially?
                            
                                Python: URLError: <urlopen error [Errno 10060]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

(Python - sklearn) How to pass parameters to the customize ModelTransformer class by gridsearchcv

Tags:

parameter-passing

machine-learning

python-2.7

scikit-learn

cross-validation

nkhuyu

People also ask

1 Answers

Artem Sobolev

Recent Activity

Donate For Us