Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I use scikit-learn pipeline to transform a specific variable only?

Reading scikit-learn doc on Pipeline, all the examples apply the transformers on the entire dataset (e.g. StandardScaler, PCA).

Is it possible to, say, only scale a specific variable in the dataset? If this is possible, then I can put my entire feature engineering process into a Pipeline and apply it on both my train and test sets.

like image 363
Heisenberg Avatar asked Nov 29 '25 16:11

Heisenberg


1 Answers

You can use a combination of FeatureUnion and custom transformers that take only the variable you're interested in.

However, you're right in that sklearn does not handle heterogeneous feature sets particularly well. There is a library sklearn-pandas which makes it a lot easier, letting you define separate pipelines for specific columns of a pandas dataframe.

like image 104
Mark Whitfield Avatar answered Dec 02 '25 06:12

Mark Whitfield