Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pass a parameter to only one part of a pipeline object in scikit learn?

I need to pass a parameter, sample_weight, to my RandomForestClassifier like so:

X = np.array([[2.0, 2.0, 1.0, 0.0, 1.0, 3.0, 3.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0,
        1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 5.0, 3.0,
        2.0, '0'],
       [15.0, 2.0, 5.0, 5.0, 0.466666666667, 4.0, 3.0, 2.0, 0.0, 0.0, 0.0,
        0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0,
        7.0, 14.0, 2.0, '0'],
       [3.0, 4.0, 3.0, 1.0, 1.33333333333, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0,
        0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
        9.0, 8.0, 2.0, '0'],
       [3.0, 2.0, 3.0, 0.0, 0.666666666667, 2.0, 2.0, 1.0, 0.0, 0.0, 0.0,
        0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
        5.0, 3.0, 1.0, '0']], dtype=object)

y = np.array([ 0.,  0.,  1.,  0.])

m = sklearn.ensemble.RandomForestClassifier(
        random_state=0, 
        oob_score=True, 
        n_estimators=100,
        min_samples_leaf=5, 
        max_depth=10)

m.fit(X, y, sample_weight=np.array([3,4,2,3]))

The above code works perfectly fine. Then, I try to do this in a pipeline object like so, using pipeline object instead of only random forest:

m = sklearn.pipeline.Pipeline([
    ('feature_selection', sklearn.feature_selection.SelectKBest(
        score_func=sklearn.feature_selection.f_regression,
        k=25)),
    ('model', sklearn.ensemble.RandomForestClassifier(
        random_state=0, 
        oob_score=True, 
        n_estimators=500,
        min_samples_leaf=5, 
        max_depth=10))])

m.fit(X, y, sample_weight=np.array([3,4,2,3]))

Now this breaks in the fit method with "ValueError: need more than 1 value to unpack".

ValueError                                Traceback (most recent call last)
<ipython-input-212-c4299f5b3008> in <module>()
     25         max_depth=10))])
     26 
---> 27 m.fit(X, y, sample_weights=np.array([3,4,2,3]))

/usr/local/lib/python2.7/dist-packages/sklearn/pipeline.pyc in fit(self, X, y, **fit_params)
    128         data, then fit the transformed data using the final estimator.
    129         """
--> 130         Xt, fit_params = self._pre_transform(X, y, **fit_params)
    131         self.steps[-1][-1].fit(Xt, y, **fit_params)
    132         return self

/usr/local/lib/python2.7/dist-packages/sklearn/pipeline.pyc in _pre_transform(self, X, y, **fit_params)
    113         fit_params_steps = dict((step, {}) for step, _ in self.steps)
    114         for pname, pval in six.iteritems(fit_params):
--> 115             step, param = pname.split('__', 1)
    116             fit_params_steps[step][param] = pval
    117         Xt = X

ValueError: need more than 1 value to unpack

I am using sklearn version 0.14.
I think that the problem is that the F selection step in the pipeline does not take in an argument for sample_weights. how do I pass this parameter to only one step in the pipeline with I run "fit"? Thanks.

like image 748
makansij Avatar asked Feb 25 '16 16:02

makansij


People also ask

What's the difference between pipeline () and Make_pipeline () from Sklearn library?

The only difference is that make_pipeline generates names for steps automatically.

What is ColumnTransformer in Sklearn?

The ColumnTransformer is a class in the scikit-learn Python machine learning library that allows you to selectively apply data preparation transforms.

How does pipeline work Sklearn?

Python scikit-learn provides a Pipeline utility to help automate machine learning workflows. Pipelines work by allowing for a linear sequence of data transforms to be chained together culminating in a modeling process that can be evaluated.

What is the use of Make_pipeline?

'make_pipeline' is a utility function that is a shorthand for constructing pipelines. It takes a variable number of estimates and returns a pipeline by filling the names automatically.

What is a custom pipeline in scikit-learn?

Pipelines are extremely useful and versatile objects in the scikit-learn package. They can be nested and combined with other sklearn objects to create repeatable and easily customizable data transformation and modeling workflows.

How to pass parameters from parent pipeline to child pipeline?

If you want to pass parameters from Parent pipeline to Child pipelines, all you have to do is add parameters in the parent pipeline add parameters in the child pipeline and now pass the parameters from parents to child while selecting the child pipeline (please see the screenshots).

What is the purpose of a pipeline in Python?

The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in the example below.

How to display sklearn pipeline diagram in Jupyter Notebook?

One way to do this is to set sklearn’s display parameter to 'diagram' to show an HTML representation when you call display () on the pipeline object itself. The HTML will be interactive in a Jupyter Notebook, and you can click on each step to expand it and see its current parameters.


1 Answers

From the documentation:

The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in the example below.

So you can simply insert model__ in front of whatever fit parameter kwargs you want to pass to your 'model' step:

m.fit(X, y, model__sample_weight=np.array([3,4,2,3]))
like image 172
ali_m Avatar answered Sep 17 '22 00:09

ali_m