According to sklearn.pipeline.Pipeline documentation,
The pipeline has all the methods that the last estimator in the pipeline has, i.e. if the last estimator is a classifier, the Pipeline can be used as a classifier. If the last estimator is a transformer, again, so is the pipeline.
The following example creates a dummy transformer with a custom, dummy function f
:
class C:
def fit(self, X, y=None):
print('fit')
return self
def transform(self, X):
print('transform')
return X
def f(self):
print('abc')
from sklearn.pipeline import Pipeline
ppl = Pipeline([('C', C())])
I was expecting to be able to access the f
function of the C
transformer, however calling ppl.f()
results in AttributeError: 'Pipeline' object has no attribute 'f'
Am I misinterpreting the documentation? Is there a good and reliable way to access the last transformer's functions?
The Pipeline
documentation slightly overstates things. It has all the estimator methods of its last estimator. These include things like predict(), fit_predict(), fit_transform(), transform(), decision_function(), predict_proba()...
.
It cannot use any other functions, because it wouldn't know what to do with all the other steps in the pipeline. For most situations, you pass (X)
or possibly (X,y)
, and X and/or y must pass through every chain in the pipeline either with fit_transform()
or transform()
.
It is fairly easy to access the last estimator, like this:
ppl.steps[-1][1].f()
But remember that doing so is bypassing the previous steps in the pipeline (i.e., if you pass it X
, it won't be scaled with your StandardScaler or whatever you are doing earlier in the pipeline.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With