I would like to understand how to apply inverse transformation in a pipeline, and not using the StandardScaler
function directly.
The code that I am using is the following:
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
categoric = X.select_dtypes(['object']).columns
numeric = X.select_dtypes(['int']).columns
tf = ColumnTransformer([('onehot', OneHotEncoder(), categoric),
('scaler', StandardScaler(), numeric)])
X_preprocessed = tf.fit_transform(X)
model = KMeans(n_clusters=2, random_state=24)
model.fit(X_preprocessed)
After getting the output of a given model (KMeans in this case), how can I get back the original scale of the numeric
values of any X dataframe?
I know StandardScaler
has a method (.inverse_transformation
) to do that, but my question arises in the use of a pipeline with ColumnTransformer
.
P.S.: The objective of doing so is to interpret the centroids of the model.
The ColumnTransformer is a class in the scikit-learn Python machine learning library that allows you to selectively apply data preparation transforms.
Pipeline can be used to chain multiple estimators into one. This is useful as there is often a fixed sequence of steps in processing the data, for example feature selection, normalization and classification.
Use the scikit-learn ColumnTransformer function to implement preprocessing functions such as MinMaxScaler and OneHotEncoder to numeric and categorical features simultaneously. Use ColumnTransformer to build all our transformations together into one object and use it with scikit-learn pipelines.
Applies transformers to columns of an array or pandas DataFrame. This estimator allows different columns or column subsets of the input to be transformed separately and the features generated by each transformer will be concatenated to form a single feature space.
You might have already found a solution, but I had a similar issue. I am working with pandas and would like the ColumnTransformer to return a dataframe again. I do this by placing the column names back in order as they are used in the columntransformer, but I wanted to make sure it was correct so I wanted to inverse the transformation and check if it returned the original dataframe and thus hadn't mislabeled any columns.
There are 2 ways to access the sub-transformers inside your tf:
tf.transformers_[1][1] # second transformer, 2nd item being the actual class
tf.named_transformers_['scaler']
You can then call the inverse_transform for that particular sub-transformer. This only gives you the ability to do the inverse with one of the transformers so you'd have to then reconstruct your dataset by appending the results of both into 1 frame again.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With