I need to pass a parameter, <code>sample_weight</code>, to my <code>RandomForestClassifier</code> like so: <pre class="prettyprint"><code>X = np.array([[2.0, 2.0, 1.0, 0.0, 1.0, 3.0, 3.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 5.0, 3.0, 2.0, '0'], [15.0, 2.0, 5.0, 5.0, 0.466666666667, 4.0, 3.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 7.0, 14.0, 2.0, '0'], [3.0, 4.0, 3.0, 1.0, 1.33333333333, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 9.0, 8.0, 2.0, '0'], [3.0, 2.0, 3.0, 0.0, 0.666666666667, 2.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 5.0, 3.0, 1.0, '0']], dtype=object) y = np.array([ 0., 0., 1., 0.]) m = sklearn.ensemble.RandomForestClassifier( random_state=0, oob_score=True, n_estimators=100, min_samples_leaf=5, max_depth=10) m.fit(X, y, sample_weight=np.array([3,4,2,3])) </code></pre> The above code works perfectly fine. Then, I try to do this in a pipeline object like so, using pipeline object instead of only random forest: <pre class="prettyprint"><code>m = sklearn.pipeline.Pipeline([ ('feature_selection', sklearn.feature_selection.SelectKBest( score_func=sklearn.feature_selection.f_regression, k=25)), ('model', sklearn.ensemble.RandomForestClassifier( random_state=0, oob_score=True, n_estimators=500, min_samples_leaf=5, max_depth=10))]) m.fit(X, y, sample_weight=np.array([3,4,2,3])) </code></pre> Now this breaks in the <code>fit</code> method with "<code>ValueError: need more than 1 value to unpack</code>". <pre class="prettyprint"><code>ValueError Traceback (most recent call last) <ipython-input-212-c4299f5b3008> in <module>() 25 max_depth=10))]) 26 ---> 27 m.fit(X, y, sample_weights=np.array([3,4,2,3])) /usr/local/lib/python2.7/dist-packages/sklearn/pipeline.pyc in fit(self, X, y, **fit_params) 128 data, then fit the transformed data using the final estimator. 129 """ --> 130 Xt, fit_params = self._pre_transform(X, y, **fit_params) 131 self.steps[-1][-1].fit(Xt, y, **fit_params) 132 return self /usr/local/lib/python2.7/dist-packages/sklearn/pipeline.pyc in _pre_transform(self, X, y, **fit_params) 113 fit_params_steps = dict((step, {}) for step, _ in self.steps) 114 for pname, pval in six.iteritems(fit_params): --> 115 step, param = pname.split('__', 1) 116 fit_params_steps[step][param] = pval 117 Xt = X ValueError: need more than 1 value to unpack </code></pre> I am using <code>sklearn</code> version <code>0.14</code>. I think that the problem is that the <code>F selection</code> step in the pipeline does not take in an argument for sample_weights. how do I pass this parameter to only one step in the pipeline with I run "<code>fit</code>"? Thanks.

From the documentation: <blockquote> The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in the example below. </blockquote> So you can simply insert <code>model__</code> in front of whatever fit parameter kwargs you want to pass to your <code>'model'</code> step: <pre class="prettyprint"><code>m.fit(X, y, model__sample_weight=np.array([3,4,2,3])) </code></pre>

How to pass a parameter to only one part of a pipeline object in scikit learn?

Tags:

python

pandas

scikit-learn

pipeline

I need to pass a parameter, sample_weight, to my RandomForestClassifier like so:

X = np.array([[2.0, 2.0, 1.0, 0.0, 1.0, 3.0, 3.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0,
        1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 5.0, 3.0,
        2.0, '0'],
       [15.0, 2.0, 5.0, 5.0, 0.466666666667, 4.0, 3.0, 2.0, 0.0, 0.0, 0.0,
        0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0,
        7.0, 14.0, 2.0, '0'],
       [3.0, 4.0, 3.0, 1.0, 1.33333333333, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0,
        0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
        9.0, 8.0, 2.0, '0'],
       [3.0, 2.0, 3.0, 0.0, 0.666666666667, 2.0, 2.0, 1.0, 0.0, 0.0, 0.0,
        0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
        5.0, 3.0, 1.0, '0']], dtype=object)

y = np.array([ 0.,  0.,  1.,  0.])

m = sklearn.ensemble.RandomForestClassifier(
        random_state=0, 
        oob_score=True, 
        n_estimators=100,
        min_samples_leaf=5, 
        max_depth=10)

m.fit(X, y, sample_weight=np.array([3,4,2,3]))

The above code works perfectly fine. Then, I try to do this in a pipeline object like so, using pipeline object instead of only random forest:

m = sklearn.pipeline.Pipeline([
    ('feature_selection', sklearn.feature_selection.SelectKBest(
        score_func=sklearn.feature_selection.f_regression,
        k=25)),
    ('model', sklearn.ensemble.RandomForestClassifier(
        random_state=0, 
        oob_score=True, 
        n_estimators=500,
        min_samples_leaf=5, 
        max_depth=10))])

m.fit(X, y, sample_weight=np.array([3,4,2,3]))

Now this breaks in the fit method with "ValueError: need more than 1 value to unpack".

ValueError                                Traceback (most recent call last)
<ipython-input-212-c4299f5b3008> in <module>()
     25         max_depth=10))])
     26 
---> 27 m.fit(X, y, sample_weights=np.array([3,4,2,3]))

/usr/local/lib/python2.7/dist-packages/sklearn/pipeline.pyc in fit(self, X, y, **fit_params)
    128         data, then fit the transformed data using the final estimator.
    129         """
--> 130         Xt, fit_params = self._pre_transform(X, y, **fit_params)
    131         self.steps[-1][-1].fit(Xt, y, **fit_params)
    132         return self

/usr/local/lib/python2.7/dist-packages/sklearn/pipeline.pyc in _pre_transform(self, X, y, **fit_params)
    113         fit_params_steps = dict((step, {}) for step, _ in self.steps)
    114         for pname, pval in six.iteritems(fit_params):
--> 115             step, param = pname.split('__', 1)
    116             fit_params_steps[step][param] = pval
    117         Xt = X

ValueError: need more than 1 value to unpack

I am using sklearn version 0.14.
I think that the problem is that the F selection step in the pipeline does not take in an argument for sample_weights. how do I pass this parameter to only one step in the pipeline with I run "fit"? Thanks.

748

asked Feb 25 '16 16:02

makansij

1 Answers

From the documentation:

The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in the example below.

So you can simply insert model__ in front of whatever fit parameter kwargs you want to pass to your 'model' step:

m.fit(X, y, model__sample_weight=np.array([3,4,2,3]))

172

answered Sep 17 '22 00:09

ali_m

Related questions
                            
                                New to Python, GMail SMTP error
                            
                                Secure credential storage in python
                            
                                Cannot import cProfile in Python 3
                            
                                Pylint - Pylint unable to import flask.ext.wtf?
                            
                                How to remove Add button in Django admin, for specific Model?
                            
                                How do I implement markdown in Django 1.6 app?
                            
                                How can I pass parameters to on_key in fig.canvas.mpl_connect('key_press_event', on_key)?
                            
                                Numpy.dtype has the wrong size, try recompiling
                            
                                Django - Rotating File Handler stuck when file is equal to maxBytes
                            
                                How to speed up multiple inner products in python
                            
                                How to change the color of ttk button
                            
                                how to get argparse to read arguments from a file with an option rather than prefix
                            
                                Blend overlapping images in python
                            
                                matplotlib make axis ticks label for dates bold
                            
                                Refreshing a QWidget
                            
                                Receiving "NO CARRIER" error while tring to make a call using GSM modem in Python
                            
                                Macros in django templates
                            
                                How to see if the list contains consecutive numbers
                            
                                reverse dataframe's rows' order with pandas [duplicate]
                            
                                Using Sympy Equations for Plotting

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With