Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sklearn PCA explained variance and explained variance ratio difference

I'm trying to get the variances from the eigen vectors.

What is the difference between explained_variance_ratio_ and explained_variance_ in PCA?

like image 593
Kumaresh Babu N S Avatar asked Dec 31 '22 18:12

Kumaresh Babu N S


1 Answers

The percentage of the explained variance is:

explained_variance_ratio_

The variance i.e. the eigenvalues of the covariance matrix is:

explained_variance_

Formula: explained_variance_ratio_ = explained_variance_ / np.sum(explained_variance_)

Example:

import numpy as np
from sklearn.decomposition import PCA
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
pca = PCA(n_components=2)
pca.fit(X)  
pca.explained_variance_
array([7.93954312, 0.06045688]) # the actual eigenvalues (variance)

pca.explained_variance_ratio_ # the percentage of the variance
array([0.99244289, 0.00755711])

Also based on the above formula:

7.93954312 / (7.93954312+ 0.06045688) = 0.99244289

From the documentation:

explained_variance_ : array, shape (n_components,) The amount of variance explained by each of the selected components.

Equal to n_components largest eigenvalues of the covariance matrix of X.

New in version 0.18.

explained_variance_ratio_ : array, shape (n_components,) Percentage of variance explained by each of the selected components.

If n_components is not set then all components are stored and the sum of the ratios is equal to 1.0.

like image 180
seralouk Avatar answered Jan 10 '23 06:01

seralouk