Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Insert or delete a step in scikit-learn Pipeline

Is it possible to delete or insert a step in a sklearn.pipeline.Pipeline object?

I am trying to do a grid search with or without one step in the Pipeline object. And wondering whether I can insert or delete a step in the pipeline. I saw in the Pipeline source code, there is a self.steps object holding all the steps. We can get the steps by named_steps(). Before modifying it, I want to make sure, I do not cause unexpected effects.

Here is a example code:

from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.decomposition import PCA
estimators = [('reduce_dim', PCA()), ('svm', SVC())]
clf = Pipeline(estimators)
clf 

Is it possible that we do something like steps = clf.named_steps(), then insert or delete in this list? Does this cause undesired effect on the clf object?

like image 599
Bin Avatar asked Dec 16 '15 23:12

Bin


3 Answers

I see that everyone mentioned only the delete step. In case you want to also insert a step in the pipeline:

pipe.steps.append(['step name',transformer()])

pipe.steps works in the same way as lists do, so you can also insert an item into a specific location:

pipe.steps.insert(1,['estimator',transformer()]) #insert as second step
like image 103
HonzaB Avatar answered Nov 06 '22 11:11

HonzaB


Based on rudimentary testing you can safely remove a step from a scikit-learn pipeline just like you would any list item, with a simple

clf_pipeline.steps.pop(n)

where n is the position of the individual estimator you are trying to remove.

like image 17
labelmaker Avatar answered Nov 06 '22 13:11

labelmaker


Just chiming in because I feel like the other answers answered the question of adding steps to a pipeline really well, but didn't really cover how to delete a step from a pipeline.

Watch out with my approach though. Slicing lists in this instance is a bit weird.

from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.preprocessing import PolynomialFeatures

estimators = [('reduce_dim', PCA()), ('poly', PolynomialFeatures()), ('svm', SVC())]
clf = Pipeline(estimators)

If you want to create a pipeline with just steps PCA/Polynomial you can just slice the list step by indexes and pass it to Pipeline

clf1 = Pipeline(clf.steps[0:2])

Want to just use steps 2/3? Watch out these slices don't always make the most amount of sense

clf2 = Pipeline(clf.steps[1:3])

Want to just use steps 1/3? I can't seem to do using this approach

clf3 = Pipeline(clf.steps[0] + clf.steps[2]) # errors
like image 6
plumbus_bouquet Avatar answered Nov 06 '22 13:11

plumbus_bouquet