I have been playing around with sklearn PCA and it is behaving oddly.
from sklearn.decomposition import PCA
import numpy as np
identity = np.identity(10)
pca = PCA(n_components=10)
augmented_identity = pca.fit_transform(identity)
np.linalg.norm(identity - augmented_identity)
4.5997749080745738
Note that I set the number of dimensions to be 10. Shouldn't the norm be 0?
Any insight into why it is not would be appreciated.
Although PCA computes the orthogonal components based on covariance matrix, the input to PCA in sklearn is the data matrix instead of covairance/correlation matrix.
import numpy as np
from sklearn.decomposition import PCA
# gaussian random variable, 10-dimension, identity cov mat
X = np.random.randn(100000, 10)
pca = PCA(n_components=10)
X_transformed = pca.fit_transform(X)
np.linalg.norm(np.cov(X.T) - np.cov(X_transformed.T))
Out[219]: 0.044691263454134933
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With