Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement inverse transformation in a pipeline of a ColumnTransformer?

I would like to understand how to apply inverse transformation in a pipeline, and not using the StandardScaler function directly.

The code that I am using is the following:

import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler

categoric = X.select_dtypes(['object']).columns
numeric =   X.select_dtypes(['int']).columns

tf = ColumnTransformer([('onehot', OneHotEncoder(), categoric),
                        ('scaler', StandardScaler(), numeric)])

X_preprocessed = tf.fit_transform(X)

model = KMeans(n_clusters=2, random_state=24)
model.fit(X_preprocessed)

After getting the output of a given model (KMeans in this case), how can I get back the original scale of the numeric values of any X dataframe?

I know StandardScaler has a method (.inverse_transformation) to do that, but my question arises in the use of a pipeline with ColumnTransformer.

P.S.: The objective of doing so is to interpret the centroids of the model.

like image 658
Jaime Vera Avatar asked Oct 26 '22 16:10

Jaime Vera


People also ask

Is ColumnTransformer defined in scikit-learn a pipeline?

The ColumnTransformer is a class in the scikit-learn Python machine learning library that allows you to selectively apply data preparation transforms.

Can pipeline have multiple estimators?

Pipeline can be used to chain multiple estimators into one. This is useful as there is often a fixed sequence of steps in processing the data, for example feature selection, normalization and classification.

When should I use Sklearn ColumnTransformer?

Use the scikit-learn ColumnTransformer function to implement preprocessing functions such as MinMaxScaler and OneHotEncoder to numeric and categorical features simultaneously. Use ColumnTransformer to build all our transformations together into one object and use it with scikit-learn pipelines.

Why do we use ColumnTransformer?

Applies transformers to columns of an array or pandas DataFrame. This estimator allows different columns or column subsets of the input to be transformed separately and the features generated by each transformer will be concatenated to form a single feature space.


1 Answers

You might have already found a solution, but I had a similar issue. I am working with pandas and would like the ColumnTransformer to return a dataframe again. I do this by placing the column names back in order as they are used in the columntransformer, but I wanted to make sure it was correct so I wanted to inverse the transformation and check if it returned the original dataframe and thus hadn't mislabeled any columns.

There are 2 ways to access the sub-transformers inside your tf:

tf.transformers_[1][1] # second transformer, 2nd item being the actual class
tf.named_transformers_['scaler']

You can then call the inverse_transform for that particular sub-transformer. This only gives you the ability to do the inverse with one of the transformers so you'd have to then reconstruct your dataset by appending the results of both into 1 frame again.

like image 164
Ruben Debien Avatar answered Nov 15 '22 06:11

Ruben Debien