I have been using the normal PCA from scikit-learn and get the variance ratios for each principal component without any issues.
pca = sklearn.decomposition.PCA(n_components=3)
pca_transform = pca.fit_transform(feature_vec)
var_values = pca.explained_variance_ratio_
I want to explore differnt kernels using kernel PCA and also want the explained variance ratios but I am now seeing it doesn't have this attribute. Does anyone know how to get these values?
kpca = sklearn.decomposition.KernelPCA(kernel=kernel, n_components=3)
kpca_transform = pca.fit_transform(feature_vec)
var_values = kpca.explained_variance_ratio_
AttributeError: 'KernelPCA' object has no attribute 'explained_variance_ratio_'
Some criteria say that the total variance explained by all components should be between 70% to 80% variance, which in this case would mean about four to five components. The authors of the book say that this may be untenable for social science research where extracted factors usually explain only 50% to 60%.
The explained variance ratio is the percentage of variance that is attributed by each of the selected components. Ideally, you would choose the number of components to include in your model by adding the explained variance ratio of each component until you reach a total of around 0.8 or 80% to avoid overfitting.
The total variance is the sum of variances of all individual principal components. The fraction of variance explained by a principal component is the ratio between the variance of that principal component and the total variance. For several principal components, add up their variances and divide by the total variance.
The cumulative explained variance shows the accumulation of variance for each principal component number. The individual explained variance describes the variance of each principal component.
I know this question is old, but I ran into the same 'problem' and found an easy solution when I realized that the pca.explained_variance_
is simply the variance of the components. You can simply compute the explained variance (and ratio) by doing:
kpca_transform = kpca.fit_transform(feature_vec) explained_variance = numpy.var(kpca_transform, axis=0) explained_variance_ratio = explained_variance / numpy.sum(explained_variance)
and as a bonus, to get the cumulative proportion explained variance (often useful in selecting components and estimating the dimensionality of your space):
numpy.cumsum(explained_variance_ratio)
The main reason K-PCA does not have explained_variance_ratio_
is because after the kernel transformation of your data/vectors live in different feature space. Hence K-PCA is not supposed to be interpreted like PCA.
i was intrigued by this as well so i did some testing. below is my code.
the plots will show that the first component of the kernelpca is a better discriminator of the dataset. however when the explained_variance_ratios are calculated based on @EelkeSpaak explanation, we see only a 50% variance explained ratio which doesnt make sense. hence it inclines me to agree with @Krishna Kalyan explanation.
#get data
from sklearn.datasets import make_moons
import numpy as np
import matplotlib.pyplot as plt
x, y = make_moons(n_samples=100, random_state=123)
plt.scatter(x[y==0, 0], x[y==0, 1], color='red', marker='^', alpha=0.5)
plt.scatter(x[y==1, 0], x[y==1, 1], color='blue', marker='o', alpha=0.5)
plt.show()
##seeing effect of linear-pca-------
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
x_pca = pca.fit_transform(x)
x_tx = x_pca
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(7,3))
ax[0].scatter(x_tx[y==0, 0], x_tx[y==0, 1], color='red', marker='^', alpha=0.5)
ax[0].scatter(x_tx[y==1, 0], x_tx[y==1, 1], color='blue', marker='o', alpha=0.5)
ax[1].scatter(x_tx[y==0, 0], np.zeros((50,1))+0.02, color='red', marker='^', alpha=0.5)
ax[1].scatter(x_tx[y==1, 0], np.zeros((50,1))-0.02, color='blue', marker='o', alpha=0.5)
ax[0].set_xlabel('PC-1')
ax[0].set_ylabel('PC-2')
ax[0].set_ylim([-0.8,0.8])
ax[1].set_ylim([-0.8,0.8])
ax[1].set_yticks([])
ax[1].set_xlabel('PC-1')
plt.show()
##seeing effect of kernelized-pca------
from sklearn.decomposition import KernelPCA
kpca = KernelPCA(n_components=2, kernel='rbf', gamma=15)
x_kpca = kpca.fit_transform(x)
x_tx = x_kpca
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(7,3))
ax[0].scatter(x_tx[y==0, 0], x_tx[y==0, 1], color='red', marker='^', alpha=0.5)
ax[0].scatter(x_tx[y==1, 0], x_tx[y==1, 1], color='blue', marker='o', alpha=0.5)
ax[1].scatter(x_tx[y==0, 0], np.zeros((50,1))+0.02, color='red', marker='^', alpha=0.5)
ax[1].scatter(x_tx[y==1, 0], np.zeros((50,1))-0.02, color='blue', marker='o', alpha=0.5)
ax[0].set_xlabel('PC-1')
ax[0].set_ylabel('PC-2')
ax[0].set_ylim([-0.8,0.8])
ax[1].set_ylim([-0.8,0.8])
ax[1].set_yticks([])
ax[1].set_xlabel('PC-1')
plt.show()
##comparing the 2 pcas-------
#get the transformer
tx_pca = pca.fit(x)
tx_kpca = kpca.fit(x)
#transform the original data
x_pca = tx_pca.transform(x)
x_kpca = tx_kpca.transform(x)
#for the transformed data, get the explained variances
expl_var_pca = np.var(x_pca, axis=0)
expl_var_kpca = np.var(x_kpca, axis=0)
print('explained variance pca: ', expl_var_pca)
print('explained variance kpca: ', expl_var_kpca)
expl_var_ratio_pca = expl_var_pca / np.sum(expl_var_pca)
expl_var_ratio_kpca = expl_var_kpca / np.sum(expl_var_kpca)
print('explained variance ratio pca: ', expl_var_ratio_pca)
print('explained variance ratio kpca: ', expl_var_ratio_kpca)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With