Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to toggle a certain step in sklearn pipeline?

Tags:

I wonder if we can set up an "optional" step in sklearn.pipeline. For example, for a classification problem, I may want to try an ExtraTreesClassifier with AND without a PCA transformation ahead of it. In practice, it might be a pipeline with an extra parameter specifying the toggle of the PCA step, so that I can optimize on it via GridSearch and etc. I don't see such an implementation in sklearn source, but is there any work-around?

Furthermore, since the possible parameter values of a following step in pipeline might depend on the parameters in a previous step (e.g., valid values of ExtraTreesClassifier.max_features depend on PCA.n_components), is it possible to specify such a conditional dependency in sklearn.pipeline and sklearn.grid_search?

Thank you!

like image 284
dolaameng Avatar asked Oct 09 '13 03:10

dolaameng


People also ask

What is the difference between Make_pipeline and pipeline?

The pipeline requires naming the steps, manually. make_pipeline names the steps, automatically. Names are defined explicitly, without rules. Names are generated automatically using a straightforward rule (lower case of the estimator).

How do you use Sklearn pipeline for ridiculously neat code?

Intro to Scikit-learn Pipelines 19 features have NaNs. Now, on to preprocessing. For numeric columns, we first fill the missing values with SimpleImputer using the mean and feature scale using MinMaxScaler . For categoricals, we will again use SimpleImputer to fill the missing values with the mode of each column.

What are two advantages of using Sklearn pipelines?

Scikit-learn pipelines are a tool to simplify this process. They have several key benefits: They make your workflow much easier to read and understand. They enforce the implementation and order of steps in your project.


1 Answers

  • Pipeline steps cannot currently be made optional in a grid search but you could wrap the PCA class into your own OptionalPCA component with a boolean parameter to turn off PCA when requested as a quick workaround. You might want to have a look at hyperopt to setup more complex search spaces. I think it has good sklearn integration to support this kind of patterns by default but I cannot find the doc anymore. Maybe have a look at this talk.

  • For the dependent parameters problem, GridSearchCV supports trees of parameters to handle this case as demonstrated in the documentation.

like image 66
ogrisel Avatar answered Oct 03 '22 09:10

ogrisel