How to use scikit learn inverse_transform with new values

Tags:

I have a set of data that I have used scikit learn PCA. I scaled the data before performing PCA with StandardScaler().

variance_to_retain = 0.99
np_scaled = StandardScaler().fit_transform(df_data)
pca = PCA(n_components=variance_to_retain)
np_pca = pca.fit_transform(np_scaled)

# make dataframe of scaled data
# put column names on scaled data for use later
df_scaled = pd.DataFrame(np_scaled, columns=df_data.columns)
num_components = len(pca.explained_variance_ratio_)
cum_variance_explained = np.cumsum(pca.explained_variance_ratio_)

eigenvalues = pca.explained_variance_
eigenvectors = pca.components_

I then ran K-Means clustering on the scaled dataset. I can plot the cluster centers just fine in scaled space.

My question is: how do I transform the locations of the centers back into the original data space. I know that StandardScaler.fit_transform() make the data have zero mean and unit variance. But with the new points of shape (num_clusters, num_features), can I use inverse_transform(centers) to get the centers transformed back into the range and offset of the original data?

Thanks, David

372

asked Apr 17 '18 18:04

David McCormick

1 Answers

you can get cluster_centers on a kmeans, and just push that into your pca.inverse_transform

here's an example

import numpy as np
from sklearn import decomposition
from sklearn import datasets
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler



iris = datasets.load_iris()
X = iris.data
y = iris.target

scal = StandardScaler()
X_t = scal.fit_transform(X)

pca = decomposition.PCA(n_components=3)
pca.fit(X_t)
X_t = pca.transform(X_t)

clf = KMeans(n_clusters=3)
clf.fit(X_t)

scal.inverse_transform(pca.inverse_transform(clf.cluster_centers_))

Note that sklearn has multiple ways to do the fit/transform. You can do StandardScaler().fit_transform(X) but you lose the scaler, and can't reuse it; nor can you use it to create an inverse.

Alternatively, you can do scal = StandardScaler() followed by scal.fit(X) and then by scal.transform(X)

OR you can do scal.fit_transform(X) which combines the fit/transform step

143

answered Sep 28 '22 17:09

Mohammad Athar

Related questions
                            
                                Regular expression must contain and may only contain
                            
                                Pythonic way to apply format to all strings in dictionary without f-strings
                            
                                How to support %x formatting on a class that emulates int
                            
                                Creating a column based on multiple conditions
                            
                                Why is it scipy.stats.gaussian_kde() slower than seaborn.kde_plot() for the same data?
                            
                                How To Parse Verbs Using Spacy
                            
                                Python and HyperOpt: How to make multi-process grid searching?
                            
                                Filtering signal frequency in Python
                            
                                How Java program can run python program with virtual environment?
                            
                                How do I sync values in setup.py / install_requires with Pipfile / packages
                            
                                Azure storage get_blob_to_stream cant download saved csv file as stream
                            
                                How to implement a log uniform distribution in Scipy?
                            
                                Can't compare input variables to those from a file
                            
                                How to integrate Django with Kafka using Python?
                            
                                Using strptime to get UTC offset with separation between hours and minutes
                            
                                Replace multiple strings at the same time
                            
                                How to sum all amounts by date in pandas dataframe?
                            
                                Parsing a string as a Python argument list
                            
                                Conversion of latitude and longitude for fraud detection classification ML
                            
                                Is it a bad programing practice to put a function inside a class method?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use scikit learn inverse_transform with new values

Tags:

python

scikit-learn

pca

David McCormick

People also ask

1 Answers

Mohammad Athar

Recent Activity

Donate For Us