I am using scikit-learn. The nature of my application is such that I do the fitting offline, and then can only use the resulting coefficients online(on the fly), to manually calculate various objectives. The transform is simple, it is just <code>data * pca.components_</code>, i.e. simple dot product. However, I have no idea how to perform the inverse transform. Which field of the <code>pca</code> object contains the relevant coefficients for the inverse transform? How do I calculate the inverse transform? Specifically, I am referring to the PCA.inverse_transform() method call available in the <code>sklearn.decomposition.PCA package</code>: how can I manually reproduce its functionality using various coefficients calculated by the PCA?

1) <code>transform</code> is not <code>data * pca.components_</code>. Firstly, <code>*</code> is not dot product for numpy array. It is element-wise multiplication. To perform dot product, you need to use <code>np.dot</code>. Secondly, the shape of <code>PCA.components_</code> is (n_components, n_features) while the shape of data to transform is (n_samples, n_features), so you need to transpose <code>PCA.components_</code> to perform dot product. Moreover, the first step of transform is to subtract the mean, therefore if you do it manually, you also need to subtract the mean at first. The correct way to transform is <pre class="prettyprint"><code>data_reduced = np.dot(data - pca.mean_, pca.components_.T) </code></pre> 2) <code>inverse_transform</code> is just the inverse process of <code>transform</code> <pre class="prettyprint"><code>data_original = np.dot(data_reduced, pca.components_) + pca.mean_ </code></pre> <hr> If your data already has zero mean in each column, you can ignore the <code>pca.mean_</code> above, for example <pre class="prettyprint"><code>import numpy as np from sklearn.decomposition import PCA pca = PCA(n_components=3) pca.fit(data) data_reduced = np.dot(data, pca.components_.T) # transform data_original = np.dot(data_reduced, pca.components_) # inverse_transform </code></pre>

PCA inverse transform manually

Tags:

python

numpy

scikit-learn

pca

I am using scikit-learn. The nature of my application is such that I do the fitting offline, and then can only use the resulting coefficients online(on the fly), to manually calculate various objectives.

The transform is simple, it is just data * pca.components_, i.e. simple dot product. However, I have no idea how to perform the inverse transform. Which field of the pca object contains the relevant coefficients for the inverse transform? How do I calculate the inverse transform?

Specifically, I am referring to the PCA.inverse_transform() method call available in the sklearn.decomposition.PCA package: how can I manually reproduce its functionality using various coefficients calculated by the PCA?

521

asked Sep 23 '15 23:09

Baron Yugovich

1 Answers

1) transform is not data * pca.components_.

Firstly, * is not dot product for numpy array. It is element-wise multiplication. To perform dot product, you need to use np.dot.

Secondly, the shape of PCA.components_ is (n_components, n_features) while the shape of data to transform is (n_samples, n_features), so you need to transpose PCA.components_ to perform dot product.

Moreover, the first step of transform is to subtract the mean, therefore if you do it manually, you also need to subtract the mean at first.

The correct way to transform is

data_reduced = np.dot(data - pca.mean_, pca.components_.T)

2) inverse_transform is just the inverse process of transform

data_original = np.dot(data_reduced, pca.components_) + pca.mean_

If your data already has zero mean in each column, you can ignore the pca.mean_ above, for example

import numpy as np
from sklearn.decomposition import PCA

pca = PCA(n_components=3)
pca.fit(data)

data_reduced = np.dot(data, pca.components_.T) # transform
data_original = np.dot(data_reduced, pca.components_) # inverse_transform

189

answered Sep 16 '22 12:09

yangjie

Related questions
                            
                                beautifulSoup html csv
                            
                                How to monitor events from workers in a Celery-Django application?
                            
                                Matplotlib half black and half white circle
                            
                                TypeError: type object argument after * must be a sequence, not generator
                            
                                Python writing binary files, bytes
                            
                                Compare length of three lists in python [closed]
                            
                                How to use timeit when timing a function
                            
                                ImportError: No module named backend_tkagg
                            
                                Getting all rows with NaN value
                            
                                What size to specify to `PIL.Image.frombytes`
                            
                                MongoDB window closes automatically when I try to open
                            
                                How to validate a unit test with random values
                            
                                What content type should be in http header of soap 1.2 message?
                            
                                "Python Implementation" vs. "Python distribution" vs. Python itself?
                            
                                Extract hyperlinks from PDF in Python
                            
                                How to select QTableView row with one click
                            
                                Python: Type Annotations, how to define elements of a tuple?
                            
                                Django how to turn off warning
                            
                                Where can I override jwt_response_payload_handler method?
                            
                                Anaconda ImportError: libSM.so.6: cannot open shared object file: No such file or directory

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With