I've been doing some Geometrical Data Analysis (GDA) such as Principal Component Analysis (PCA). I'm looking to plot a Correlation Circle... these look a bit like this:
Basically, it allows to measure to which extend the Eigenvalue / Eigenvector of a variable is correlated to the principal components (dimensions) of a dataset.
Anyone knows if there is a python package that plots such data visualization?
Here is a simple example using sklearn and the iris dataset. Includes both the factor map for the first two dimensions and a scree plot:
from sklearn.decomposition import PCA
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
df = sns.load_dataset('iris')
n_components = 4
# Do the PCA.
pca = PCA(n_components=n_components)
reduced = pca.fit_transform(df[['sepal_length', 'sepal_width',
'petal_length', 'petal_width']])
# Append the principle components for each entry to the dataframe
for i in range(0, n_components):
df['PC' + str(i + 1)] = reduced[:, i]
display(df.head())
# Do a scree plot
ind = np.arange(0, n_components)
(fig, ax) = plt.subplots(figsize=(8, 6))
sns.pointplot(x=ind, y=pca.explained_variance_ratio_)
ax.set_title('Scree plot')
ax.set_xticks(ind)
ax.set_xticklabels(ind)
ax.set_xlabel('Component Number')
ax.set_ylabel('Explained Variance')
plt.show()
# Show the points in terms of the first two PCs
g = sns.lmplot('PC1',
'PC2',
hue='species',data=df,
fit_reg=False,
scatter=True,
size=7)
plt.show()
# Plot a variable factor map for the first two dimensions.
(fig, ax) = plt.subplots(figsize=(8, 8))
for i in range(0, pca.components_.shape[1]):
ax.arrow(0,
0, # Start the arrow at the origin
pca.components_[0, i], #0 for PC1
pca.components_[1, i], #1 for PC2
head_width=0.1,
head_length=0.1)
plt.text(pca.components_[0, i] + 0.05,
pca.components_[1, i] + 0.05,
df.columns.values[i])
an = np.linspace(0, 2 * np.pi, 100)
plt.plot(np.cos(an), np.sin(an)) # Add a unit circle for scale
plt.axis('equal')
ax.set_title('Variable factor map')
plt.show()
It'd be a good exercise to extend this to further PCs, to deal with scaling if all components are small, and to avoid plotting factors with minimal contributions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With