Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PCA inverse transform manually

I am using scikit-learn. The nature of my application is such that I do the fitting offline, and then can only use the resulting coefficients online(on the fly), to manually calculate various objectives.

The transform is simple, it is just data * pca.components_, i.e. simple dot product. However, I have no idea how to perform the inverse transform. Which field of the pca object contains the relevant coefficients for the inverse transform? How do I calculate the inverse transform?

Specifically, I am referring to the PCA.inverse_transform() method call available in the sklearn.decomposition.PCA package: how can I manually reproduce its functionality using various coefficients calculated by the PCA?

like image 521
Baron Yugovich Avatar asked Sep 23 '15 23:09

Baron Yugovich


People also ask

How do you inverse transform PCA?

2.3- Inverse transformation to reconstruct the dataAfter compressing the data by reducing the dimensionality using PCA, we can reconstruct the data and return it to its original dimension by inverse the transformation, there will be an information losses, we cant reconstruct the original data 100% (ex.

What does Fit_transform do in PCA?

This fit_transform() method is basically the combination of fit method and transform method, it is equivalent to fit(). transform(). This method performs fit and transform on the input data at a single time and converts the data points.

What is PCA Explained_variance_ratio_?

explained_variance_ratio_ method of PCA is used to get the ration of variance (eigenvalue / total eigenvalues) Bar chart is used to represent individual explained variances. Step plot is used to represent the variance explained by different principal components. Data needs to be scaled before applying PCA technique.

What is PCA Components_ in Sklearn?

Principal component analysis (PCA). Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space. The input data is centered but not scaled for each feature before applying the SVD.


1 Answers

1) transform is not data * pca.components_.

Firstly, * is not dot product for numpy array. It is element-wise multiplication. To perform dot product, you need to use np.dot.

Secondly, the shape of PCA.components_ is (n_components, n_features) while the shape of data to transform is (n_samples, n_features), so you need to transpose PCA.components_ to perform dot product.

Moreover, the first step of transform is to subtract the mean, therefore if you do it manually, you also need to subtract the mean at first.

The correct way to transform is

data_reduced = np.dot(data - pca.mean_, pca.components_.T)

2) inverse_transform is just the inverse process of transform

data_original = np.dot(data_reduced, pca.components_) + pca.mean_

If your data already has zero mean in each column, you can ignore the pca.mean_ above, for example

import numpy as np
from sklearn.decomposition import PCA

pca = PCA(n_components=3)
pca.fit(data)

data_reduced = np.dot(data, pca.components_.T) # transform
data_original = np.dot(data_reduced, pca.components_) # inverse_transform
like image 189
yangjie Avatar answered Sep 16 '22 12:09

yangjie