I wonder if we can set up an "optional" step in sklearn.pipeline
. For example, for a classification problem, I may want to try an ExtraTreesClassifier
with AND without a PCA
transformation ahead of it. In practice, it might be a pipeline with an extra parameter specifying the toggle of the PCA
step, so that I can optimize on it via GridSearch
and etc. I don't see such an implementation in sklearn source, but is there any work-around?
Furthermore, since the possible parameter values of a following step in pipeline might depend on the parameters in a previous step (e.g., valid values of ExtraTreesClassifier.max_features
depend on PCA.n_components
), is it possible to specify such a conditional dependency in sklearn.pipeline
and sklearn.grid_search
?
Thank you!
The pipeline requires naming the steps, manually. make_pipeline names the steps, automatically. Names are defined explicitly, without rules. Names are generated automatically using a straightforward rule (lower case of the estimator).
Intro to Scikit-learn Pipelines 19 features have NaNs. Now, on to preprocessing. For numeric columns, we first fill the missing values with SimpleImputer using the mean and feature scale using MinMaxScaler . For categoricals, we will again use SimpleImputer to fill the missing values with the mode of each column.
Scikit-learn pipelines are a tool to simplify this process. They have several key benefits: They make your workflow much easier to read and understand. They enforce the implementation and order of steps in your project.
Pipeline
steps cannot currently be made optional in a grid search but you could wrap the PCA
class into your own OptionalPCA
component with a boolean parameter to turn off PCA when requested as a quick workaround. You might want to have a look at hyperopt to setup more complex search spaces. I think it has good sklearn integration to support this kind of patterns by default but I cannot find the doc anymore. Maybe have a look at this talk.
For the dependent parameters problem, GridSearchCV
supports trees of parameters to handle this case as demonstrated in the documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With