How to implement inverse transformation in a pipeline of a ColumnTransformer?

Tags:

I would like to understand how to apply inverse transformation in a pipeline, and not using the StandardScaler function directly.

The code that I am using is the following:

import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler

categoric = X.select_dtypes(['object']).columns
numeric =   X.select_dtypes(['int']).columns

tf = ColumnTransformer([('onehot', OneHotEncoder(), categoric),
                        ('scaler', StandardScaler(), numeric)])

X_preprocessed = tf.fit_transform(X)

model = KMeans(n_clusters=2, random_state=24)
model.fit(X_preprocessed)

After getting the output of a given model (KMeans in this case), how can I get back the original scale of the numeric values of any X dataframe?

I know StandardScaler has a method (.inverse_transformation) to do that, but my question arises in the use of a pipeline with ColumnTransformer.

P.S.: The objective of doing so is to interpret the centroids of the model.

658

asked Oct 26 '22 16:10

Jaime Vera

1 Answers

You might have already found a solution, but I had a similar issue. I am working with pandas and would like the ColumnTransformer to return a dataframe again. I do this by placing the column names back in order as they are used in the columntransformer, but I wanted to make sure it was correct so I wanted to inverse the transformation and check if it returned the original dataframe and thus hadn't mislabeled any columns.

There are 2 ways to access the sub-transformers inside your tf:

tf.transformers_[1][1] # second transformer, 2nd item being the actual class
tf.named_transformers_['scaler']

You can then call the inverse_transform for that particular sub-transformer. This only gives you the ability to do the inverse with one of the transformers so you'd have to then reconstruct your dataset by appending the results of both into 1 frame again.

164

answered Nov 15 '22 06:11

Ruben Debien

Related questions
                            
                                how to keep pytorch model in redis cache to access model faster for video streaming?
                            
                                Django filesystem/file-based cache failing to write data 5-10% of the time
                            
                                How to use pandas.to_sql but only add row if row doesn't exist yet
                            
                                ValueError: Trying to create optimizer slot variable under the scope for tf.distribute.Strategy
                            
                                How to run python code on AWS (EC2/Lambda)
                            
                                Compile python to exe [closed]
                            
                                Why am I getting junk date values on x-axis in matplotlib?
                            
                                python: simulating the multiple inheritance of class variables
                            
                                Plotly: How to add polynomial fit line to plotly go.scatter figure using a DASH callback?
                            
                                Is one of the "__get__" arguments redundant? [duplicate]
                            
                                Check first a few digit of every line in a string, if they are equal, print a part of those lines together
                            
                                TFX IndexError on Evaluator component
                            
                                Moving mean square error between 2 arrays, 'valid', where they fully overlap
                            
                                How to log into Google Cloud Storage from a python function?
                            
                                Why are min and max listed as sequence operations?
                            
                                Setting a custom directory for Confuse YAML Configuration Files
                            
                                Finding last possible index value to satisfy filtering requirements
                            
                                UserWarning: converting a masked element to nan
                            
                                Access Django app from other computers

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to implement inverse transformation in a pipeline of a ColumnTransformer?

Tags:

python

scikit-learn

pipeline

Jaime Vera

People also ask

1 Answers

Ruben Debien

Recent Activity

Donate For Us