I am using scikit-learn. The nature of my application is such that I do the fitting offline, and then can only use the resulting coefficients online(on the fly), to manually calculate various objectives.
The transform is simple, it is just data * pca.components_
, i.e. simple dot product. However, I have no idea how to perform the inverse transform. Which field of the pca
object contains the relevant coefficients for the inverse transform? How do I calculate the inverse transform?
Specifically, I am referring to the PCA.inverse_transform() method call available in the sklearn.decomposition.PCA package
: how can I manually reproduce its functionality using various coefficients calculated by the PCA?
2.3- Inverse transformation to reconstruct the dataAfter compressing the data by reducing the dimensionality using PCA, we can reconstruct the data and return it to its original dimension by inverse the transformation, there will be an information losses, we cant reconstruct the original data 100% (ex.
This fit_transform() method is basically the combination of fit method and transform method, it is equivalent to fit(). transform(). This method performs fit and transform on the input data at a single time and converts the data points.
explained_variance_ratio_ method of PCA is used to get the ration of variance (eigenvalue / total eigenvalues) Bar chart is used to represent individual explained variances. Step plot is used to represent the variance explained by different principal components. Data needs to be scaled before applying PCA technique.
Principal component analysis (PCA). Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space. The input data is centered but not scaled for each feature before applying the SVD.
1) transform
is not data * pca.components_
.
Firstly, *
is not dot product for numpy array. It is element-wise multiplication. To perform dot product, you need to use np.dot
.
Secondly, the shape of PCA.components_
is (n_components, n_features) while the shape of data to transform is (n_samples, n_features), so you need to transpose PCA.components_
to perform dot product.
Moreover, the first step of transform is to subtract the mean, therefore if you do it manually, you also need to subtract the mean at first.
The correct way to transform is
data_reduced = np.dot(data - pca.mean_, pca.components_.T)
2) inverse_transform
is just the inverse process of transform
data_original = np.dot(data_reduced, pca.components_) + pca.mean_
If your data already has zero mean in each column, you can ignore the pca.mean_
above, for example
import numpy as np
from sklearn.decomposition import PCA
pca = PCA(n_components=3)
pca.fit(data)
data_reduced = np.dot(data, pca.components_.T) # transform
data_original = np.dot(data_reduced, pca.components_) # inverse_transform
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With