The figure below is plotted using the open-air R package:
I know matplotlib has the plt.matshow
function,
but it can't clearly show the relation between variables at the same time.
Here is my early work:
df is a pandas dataframe with 7 variables shows like below:
I don't know how to attach a .csv
file to StackOverflow.
Using plt.matshow(df.corr(),cmap = plt.cm.Greens)
, the figure shows like this:
The second figure can't represent the correlation relations of the variables as clearly as the first one.
I upload the csv file to Google docs here.
I'm not aware of any existing Python library that does these "ellipse plots", but it's not particularly hard to implement using a matplotlib.collections.EllipseCollection
:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib.collections import EllipseCollection
def plot_corr_ellipses(data, ax=None, **kwargs):
M = np.array(data)
if not M.ndim == 2:
raise ValueError('data must be a 2D array')
if ax is None:
fig, ax = plt.subplots(1, 1, subplot_kw={'aspect':'equal'})
ax.set_xlim(-0.5, M.shape[1] - 0.5)
ax.set_ylim(-0.5, M.shape[0] - 0.5)
# xy locations of each ellipse center
xy = np.indices(M.shape)[::-1].reshape(2, -1).T
# set the relative sizes of the major/minor axes according to the strength of
# the positive/negative correlation
w = np.ones_like(M).ravel()
h = 1 - np.abs(M).ravel()
a = 45 * np.sign(M).ravel()
ec = EllipseCollection(widths=w, heights=h, angles=a, units='x', offsets=xy,
transOffset=ax.transData, array=M.ravel(), **kwargs)
ax.add_collection(ec)
# if data is a DataFrame, use the row/column names as tick labels
if isinstance(data, pd.DataFrame):
ax.set_xticks(np.arange(M.shape[1]))
ax.set_xticklabels(data.columns, rotation=90)
ax.set_yticks(np.arange(M.shape[0]))
ax.set_yticklabels(data.index)
return ec
For example, using your data:
data = df.corr()
fig, ax = plt.subplots(1, 1)
m = plot_corr_ellipses(data, ax=ax, cmap='Greens')
cb = fig.colorbar(m)
cb.set_label('Correlation coefficient')
ax.margins(0.1)
Negative correlations can be plotted as ellipses with the opposite orientation:
fig2, ax2 = plt.subplots(1, 1)
data2 = np.linspace(-1, 1, 9).reshape(3, 3)
m2 = plot_corr_ellipses(data2, ax=ax2, cmap='seismic', clim=[-1, 1])
cb2 = fig2.colorbar(m2)
ax2.margins(0.3)
Assuming you are interested in showing cluster relations, the seaborn
package mentioned in the comments also has a clustermap. Using your correlation matrix (looks like you want to show correlation coefficients as int
in the [-100, 100]
range, you could do the following:
corr = df.corr().mul(100).astype(int)
GX HG RM SJ XB XN ZG
GX 100 77 62 71 48 66 57
HG 77 100 69 74 61 61 58
RM 62 69 100 75 48 64 68
SJ 71 74 75 100 50 70 65
XB 48 61 48 50 100 46 51
XN 66 61 64 70 46 100 75
ZG 57 58 68 65 51 75 100
and then use seaborn.clustermap()
as follows:
import seaborn as sns
sns.clustermap(data=corr, annot=True, fmt='d', cmap='Greens').savefig('cluster.png')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With