Obtain eigen values and vectors from sklearn PCA

Tags:

How I can get the the eigen values and eigen vectors of the PCA application?

from sklearn.decomposition import PCA clf=PCA(0.98,whiten=True)      #converse 98% variance X_train=clf.fit_transform(X_train) X_test=clf.transform(X_test)

I can't find it in docs.

1.I am "not" able to comprehend the different results here.

Edit:

def pca_code(data):     #raw_implementation     var_per=.98     data-=np.mean(data, axis=0)     data/=np.std(data, axis=0)     cov_mat=np.cov(data, rowvar=False)     evals, evecs = np.linalg.eigh(cov_mat)     idx = np.argsort(evals)[::-1]     evecs = evecs[:,idx]     evals = evals[idx]     variance_retained=np.cumsum(evals)/np.sum(evals)     index=np.argmax(variance_retained>=var_per)     evecs = evecs[:,:index+1]     reduced_data=np.dot(evecs.T, data.T).T     print(evals)     print("_"*30)     print(evecs)     print("_"*30)     #using scipy package     clf=PCA(var_per)     X_train=data.T     X_train=clf.fit_transform(X_train)     print(clf.explained_variance_)     print("_"*30)     print(clf.components_)     print("__"*30)

I wish to obtain all the eigenvalues and eigenvectors instead of just the reduced set with the convergence condition.

604

asked Aug 09 '15 23:08

Abhishek Bhatia

1 Answers

Your implementation

You are computing the eigenvectors of the correlation matrix, that is the covariance matrix of the normalized variables.
data/=np.std(data, axis=0) is not part of the classic PCA, we only center the variables. So the sklearn PCA does not feature scale the data beforehand.

Apart from that you are on the right track, if we abstract the fact that the code you provided did not run ;). You only got confused with the row/column layouts. Honestly I think it's much easier to start with X = data.T and work only with X from there on. I added your code 'fixed' at the end of the post.

Getting the eigenvalues

You already noted that you can get the eigenvectors using clf.components_.

So you have the principal components. They are eigenvectors of the covariance matrix 𝑋ᵀ𝑋.

A way to retrieve the eigenvalues from there is to apply this matrix to each principal components and project the results onto the component. Let v_1 be the first principal component and lambda_1 the associated eigenvalue. We have:
and thus: eq2 since eq3 . (x, y) the scalar product of vectors x and y.

Back in Python you can do:

n_samples = X.shape[0] # We center the data and compute the sample covariance matrix. X -= np.mean(X, axis=0) cov_matrix = np.dot(X.T, X) / n_samples for eigenvector in pca.components_:     print(np.dot(eigenvector.T, np.dot(cov_matrix, eigenvector)))

And you get the eigenvalue associated with the eigenvector. Well, in my tests it turned out not to work with the couple last eigenvalues but I'd attribute that to my absence of skills in numerical stability.

Now that's not the best way to get the eigenvalues but it's nice to know where they come from.
The eigenvalues represent the variance in the direction of the eigenvector. So you can get them through the pca.explained_variance_ attribute:

eigenvalues = pca.explained_variance_

Here is a reproducible example that prints the eigenvalues you get with each method:

import numpy as np from sklearn.decomposition import PCA from sklearn.datasets import make_classification   X, y = make_classification(n_samples=1000) n_samples = X.shape[0]  pca = PCA() X_transformed = pca.fit_transform(X)  # We center the data and compute the sample covariance matrix. X_centered = X - np.mean(X, axis=0) cov_matrix = np.dot(X_centered.T, X_centered) / n_samples eigenvalues = pca.explained_variance_ for eigenvalue, eigenvector in zip(eigenvalues, pca.components_):         print(np.dot(eigenvector.T, np.dot(cov_matrix, eigenvector)))     print(eigenvalue)

Your original code, fixed

If you run it you'll see the values are consistent. They're not exactly equal because numpy and scikit-learn are not using the same algorithm here.
The main thing was that you were using correlation matrix instead of covariance, as mentioned above. Also you were getting the transposed eigenvectors from numpy which made it very confusing.

import numpy as np from scipy.stats.mstats import zscore from sklearn.decomposition import PCA  def pca_code(data):     #raw_implementation     var_per=.98     data-=np.mean(data, axis=0)     # data/=np.std(data, axis=0)     cov_mat=np.cov(data, rowvar=False)     evals, evecs = np.linalg.eigh(cov_mat)     idx = np.argsort(evals)[::-1]     evecs = evecs[:,idx]     evals = evals[idx]     variance_retained=np.cumsum(evals)/np.sum(evals)     index=np.argmax(variance_retained>=var_per)     evecs = evecs[:,:index+1]     reduced_data=np.dot(evecs.T, data.T).T     print("evals", evals)     print("_"*30)     print(evecs.T[1, :])     print("_"*30)     #using scipy package     clf=PCA(var_per)     X_train=data     X_train=clf.fit_transform(X_train)     print(clf.explained_variance_)     print("_"*30)     print(clf.components_[1,:])     print("__"*30)

Hope this helps, feel free to ask for clarifications.

157

answered Oct 03 '22 05:10

ldirer

Related questions
                            
                                Melt the Upper Triangular Matrix of a Pandas Dataframe
                            
                                What might be the cause of 'invalid value encountered in less_equal' in numpy
                            
                                How to suppress a third-party warning using warnings.filterwarnings
                            
                                How to silence "sys.excepthook is missing" error?
                            
                                Python unittest - setUpClass() is giving me trouble - why can't I inherit like this?
                            
                                Compiling Python
                            
                                Non-ASCII characters in Matplotlib
                            
                                How to get a row-by-row MySQL ResultSet in python
                            
                                Which database engine to choose for Django app? [closed]
                            
                                ValueError: zero length field name in format in Python2.6.6
                            
                                Renaming a file in PyCharm
                            
                                How can I serialize a numpy array while preserving matrix dimensions?
                            
                                Managing connection to redis from Python
                            
                                How to change default install location for pip
                            
                                Which Python API should be used with Mongo DB and Django
                            
                                How to mock python's datetime.now() in a class method for unit testing?
                            
                                ImportError: No module named 'Cython' [duplicate]
                            
                                Python - is there a "don't care" symbol for tuple assignments?
                            
                                zip(list1, list2) in Jinja2?
                            
                                Reusing code from different IPython notebooks

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Obtain eigen values and vectors from sklearn PCA

Tags:

python

scipy

scikit-learn

pca