Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Sklearn PCA explained variance and explained variance ratio difference

I'm trying to get the variances from the eigen vectors.

What is the difference between explained_variance_ratio_ and explained_variance_ in PCA?

like image 593
Kumaresh Babu N S Avatar asked Dec 31 '22 18:12

Kumaresh Babu N S

1 Answers

The percentage of the explained variance is:


The variance i.e. the eigenvalues of the covariance matrix is:


Formula: explained_variance_ratio_ = explained_variance_ / np.sum(explained_variance_)


import numpy as np
from sklearn.decomposition import PCA
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
pca = PCA(n_components=2)
array([7.93954312, 0.06045688]) # the actual eigenvalues (variance)

pca.explained_variance_ratio_ # the percentage of the variance
array([0.99244289, 0.00755711])

Also based on the above formula:

7.93954312 / (7.93954312+ 0.06045688) = 0.99244289

From the documentation:

explained_variance_ : array, shape (n_components,) The amount of variance explained by each of the selected components.

Equal to n_components largest eigenvalues of the covariance matrix of X.

New in version 0.18.

explained_variance_ratio_ : array, shape (n_components,) Percentage of variance explained by each of the selected components.

If n_components is not set then all components are stored and the sum of the ratios is equal to 1.0.

like image 180
seralouk Avatar answered Jan 10 '23 06:01