Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scikit-learn kernel PCA explained variance

I have been using the normal PCA from scikit-learn and get the variance ratios for each principal component without any issues.

pca = sklearn.decomposition.PCA(n_components=3)
pca_transform = pca.fit_transform(feature_vec)
var_values = pca.explained_variance_ratio_

I want to explore differnt kernels using kernel PCA and also want the explained variance ratios but I am now seeing it doesn't have this attribute. Does anyone know how to get these values?

kpca = sklearn.decomposition.KernelPCA(kernel=kernel, n_components=3)
kpca_transform = pca.fit_transform(feature_vec)
var_values = kpca.explained_variance_ratio_

AttributeError: 'KernelPCA' object has no attribute 'explained_variance_ratio_'

like image 872
ecn113 Avatar asked Apr 13 '15 17:04

ecn113


People also ask

How much total variance should be explained in PCA?

Some criteria say that the total variance explained by all components should be between 70% to 80% variance, which in this case would mean about four to five components. The authors of the book say that this may be untenable for social science research where extracted factors usually explain only 50% to 60%.

What is explained variance in PCA?

The explained variance ratio is the percentage of variance that is attributed by each of the selected components. Ideally, you would choose the number of components to include in your model by adding the explained variance ratio of each component until you reach a total of around 0.8 or 80% to avoid overfitting.

How is explained variance ratio calculated PCA?

The total variance is the sum of variances of all individual principal components. The fraction of variance explained by a principal component is the ratio between the variance of that principal component and the total variance. For several principal components, add up their variances and divide by the total variance.

What is cumulative explained variance in PCA?

The cumulative explained variance shows the accumulation of variance for each principal component number. The individual explained variance describes the variance of each principal component.


3 Answers

I know this question is old, but I ran into the same 'problem' and found an easy solution when I realized that the pca.explained_variance_ is simply the variance of the components. You can simply compute the explained variance (and ratio) by doing:

kpca_transform = kpca.fit_transform(feature_vec) explained_variance = numpy.var(kpca_transform, axis=0) explained_variance_ratio = explained_variance / numpy.sum(explained_variance) 

and as a bonus, to get the cumulative proportion explained variance (often useful in selecting components and estimating the dimensionality of your space):

numpy.cumsum(explained_variance_ratio) 
like image 158
EelkeSpaak Avatar answered Sep 20 '22 14:09

EelkeSpaak


The main reason K-PCA does not have explained_variance_ratio_ is because after the kernel transformation of your data/vectors live in different feature space. Hence K-PCA is not supposed to be interpreted like PCA.

like image 20
Krishna Kalyan Avatar answered Sep 20 '22 14:09

Krishna Kalyan


i was intrigued by this as well so i did some testing. below is my code.

the plots will show that the first component of the kernelpca is a better discriminator of the dataset. however when the explained_variance_ratios are calculated based on @EelkeSpaak explanation, we see only a 50% variance explained ratio which doesnt make sense. hence it inclines me to agree with @Krishna Kalyan explanation.

#get data
from sklearn.datasets import make_moons 
import numpy as np
import matplotlib.pyplot as plt

x, y = make_moons(n_samples=100, random_state=123)
plt.scatter(x[y==0, 0], x[y==0, 1], color='red', marker='^', alpha=0.5)
plt.scatter(x[y==1, 0], x[y==1, 1], color='blue', marker='o', alpha=0.5)
plt.show()

##seeing effect of linear-pca-------
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
x_pca = pca.fit_transform(x)

x_tx = x_pca
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(7,3))
ax[0].scatter(x_tx[y==0, 0], x_tx[y==0, 1], color='red', marker='^', alpha=0.5)
ax[0].scatter(x_tx[y==1, 0], x_tx[y==1, 1], color='blue', marker='o', alpha=0.5)
ax[1].scatter(x_tx[y==0, 0], np.zeros((50,1))+0.02, color='red', marker='^', alpha=0.5)
ax[1].scatter(x_tx[y==1, 0], np.zeros((50,1))-0.02, color='blue', marker='o', alpha=0.5)
ax[0].set_xlabel('PC-1')
ax[0].set_ylabel('PC-2')
ax[0].set_ylim([-0.8,0.8])
ax[1].set_ylim([-0.8,0.8])
ax[1].set_yticks([])
ax[1].set_xlabel('PC-1')
plt.show()

##seeing effect of kernelized-pca------
from sklearn.decomposition import KernelPCA
kpca = KernelPCA(n_components=2, kernel='rbf', gamma=15)
x_kpca = kpca.fit_transform(x)


x_tx = x_kpca
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(7,3))
ax[0].scatter(x_tx[y==0, 0], x_tx[y==0, 1], color='red', marker='^', alpha=0.5)
ax[0].scatter(x_tx[y==1, 0], x_tx[y==1, 1], color='blue', marker='o', alpha=0.5)
ax[1].scatter(x_tx[y==0, 0], np.zeros((50,1))+0.02, color='red', marker='^', alpha=0.5)
ax[1].scatter(x_tx[y==1, 0], np.zeros((50,1))-0.02, color='blue', marker='o', alpha=0.5)
ax[0].set_xlabel('PC-1')
ax[0].set_ylabel('PC-2')
ax[0].set_ylim([-0.8,0.8])
ax[1].set_ylim([-0.8,0.8])
ax[1].set_yticks([])
ax[1].set_xlabel('PC-1')
plt.show()

##comparing the 2 pcas-------

#get the transformer
tx_pca = pca.fit(x)
tx_kpca = kpca.fit(x)

#transform the original data
x_pca = tx_pca.transform(x)
x_kpca = tx_kpca.transform(x)

#for the transformed data, get the explained variances
expl_var_pca = np.var(x_pca, axis=0)
expl_var_kpca = np.var(x_kpca, axis=0)
print('explained variance pca: ', expl_var_pca)
print('explained variance kpca: ', expl_var_kpca)

expl_var_ratio_pca = expl_var_pca / np.sum(expl_var_pca)
expl_var_ratio_kpca = expl_var_kpca / np.sum(expl_var_kpca)

print('explained variance ratio pca: ', expl_var_ratio_pca)
print('explained variance ratio kpca: ', expl_var_ratio_kpca)
like image 26
Faz Avatar answered Sep 19 '22 14:09

Faz