Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sklearn PCA not working

I have been playing around with sklearn PCA and it is behaving oddly.

from sklearn.decomposition import PCA
import numpy as np
identity = np.identity(10)
pca = PCA(n_components=10)
augmented_identity = pca.fit_transform(identity)
np.linalg.norm(identity - augmented_identity)

4.5997749080745738

Note that I set the number of dimensions to be 10. Shouldn't the norm be 0?

Any insight into why it is not would be appreciated.

like image 298
hiqbal Avatar asked Feb 12 '26 03:02

hiqbal


1 Answers

Although PCA computes the orthogonal components based on covariance matrix, the input to PCA in sklearn is the data matrix instead of covairance/correlation matrix.

import numpy as np
from sklearn.decomposition import PCA

# gaussian random variable, 10-dimension, identity cov mat
X = np.random.randn(100000, 10)



pca = PCA(n_components=10)
X_transformed = pca.fit_transform(X)

np.linalg.norm(np.cov(X.T) - np.cov(X_transformed.T))

Out[219]: 0.044691263454134933
like image 171
Jianxun Li Avatar answered Feb 15 '26 14:02

Jianxun Li



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!