I need to pass a parameter, sample_weight
, to my RandomForestClassifier
like so:
X = np.array([[2.0, 2.0, 1.0, 0.0, 1.0, 3.0, 3.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0,
1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 5.0, 3.0,
2.0, '0'],
[15.0, 2.0, 5.0, 5.0, 0.466666666667, 4.0, 3.0, 2.0, 0.0, 0.0, 0.0,
0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0,
7.0, 14.0, 2.0, '0'],
[3.0, 4.0, 3.0, 1.0, 1.33333333333, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0,
0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
9.0, 8.0, 2.0, '0'],
[3.0, 2.0, 3.0, 0.0, 0.666666666667, 2.0, 2.0, 1.0, 0.0, 0.0, 0.0,
0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
5.0, 3.0, 1.0, '0']], dtype=object)
y = np.array([ 0., 0., 1., 0.])
m = sklearn.ensemble.RandomForestClassifier(
random_state=0,
oob_score=True,
n_estimators=100,
min_samples_leaf=5,
max_depth=10)
m.fit(X, y, sample_weight=np.array([3,4,2,3]))
The above code works perfectly fine. Then, I try to do this in a pipeline object like so, using pipeline object instead of only random forest:
m = sklearn.pipeline.Pipeline([
('feature_selection', sklearn.feature_selection.SelectKBest(
score_func=sklearn.feature_selection.f_regression,
k=25)),
('model', sklearn.ensemble.RandomForestClassifier(
random_state=0,
oob_score=True,
n_estimators=500,
min_samples_leaf=5,
max_depth=10))])
m.fit(X, y, sample_weight=np.array([3,4,2,3]))
Now this breaks in the fit
method with "ValueError: need more than 1 value to unpack
".
ValueError Traceback (most recent call last)
<ipython-input-212-c4299f5b3008> in <module>()
25 max_depth=10))])
26
---> 27 m.fit(X, y, sample_weights=np.array([3,4,2,3]))
/usr/local/lib/python2.7/dist-packages/sklearn/pipeline.pyc in fit(self, X, y, **fit_params)
128 data, then fit the transformed data using the final estimator.
129 """
--> 130 Xt, fit_params = self._pre_transform(X, y, **fit_params)
131 self.steps[-1][-1].fit(Xt, y, **fit_params)
132 return self
/usr/local/lib/python2.7/dist-packages/sklearn/pipeline.pyc in _pre_transform(self, X, y, **fit_params)
113 fit_params_steps = dict((step, {}) for step, _ in self.steps)
114 for pname, pval in six.iteritems(fit_params):
--> 115 step, param = pname.split('__', 1)
116 fit_params_steps[step][param] = pval
117 Xt = X
ValueError: need more than 1 value to unpack
I am using sklearn
version 0.14
.
I think that the problem is that the F selection
step in the pipeline does not take in an argument for sample_weights. how do I pass this parameter to only one step in the pipeline with I run "fit
"? Thanks.
The only difference is that make_pipeline generates names for steps automatically.
The ColumnTransformer is a class in the scikit-learn Python machine learning library that allows you to selectively apply data preparation transforms.
Python scikit-learn provides a Pipeline utility to help automate machine learning workflows. Pipelines work by allowing for a linear sequence of data transforms to be chained together culminating in a modeling process that can be evaluated.
'make_pipeline' is a utility function that is a shorthand for constructing pipelines. It takes a variable number of estimates and returns a pipeline by filling the names automatically.
Pipelines are extremely useful and versatile objects in the scikit-learn package. They can be nested and combined with other sklearn objects to create repeatable and easily customizable data transformation and modeling workflows.
If you want to pass parameters from Parent pipeline to Child pipelines, all you have to do is add parameters in the parent pipeline add parameters in the child pipeline and now pass the parameters from parents to child while selecting the child pipeline (please see the screenshots).
The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in the example below.
One way to do this is to set sklearn’s display parameter to 'diagram' to show an HTML representation when you call display () on the pipeline object itself. The HTML will be interactive in a Jupyter Notebook, and you can click on each step to expand it and see its current parameters.
From the documentation:
The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in the example below.
So you can simply insert model__
in front of whatever fit parameter kwargs you want to pass to your 'model'
step:
m.fit(X, y, model__sample_weight=np.array([3,4,2,3]))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With