Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I plot a correlation matrix as a set of ellipses, similar to the R open-air package?

The figure below is plotted using the open-air R package:

a correlation matrix showing the relationships between variables

I know matplotlib has the plt.matshow function,
but it can't clearly show the relation between variables at the same time.

Here is my early work:

df is a pandas dataframe with 7 variables shows like below:

enter image description here

I don't know how to attach a .csv file to StackOverflow.

Using plt.matshow(df.corr(),cmap = plt.cm.Greens), the figure shows like this:

enter image description here

The second figure can't represent the correlation relations of the variables as clearly as the first one.

Edit:

I upload the csv file to Google docs here.

like image 917
Han Zhengzu Avatar asked Jan 01 '16 12:01

Han Zhengzu


2 Answers

I'm not aware of any existing Python library that does these "ellipse plots", but it's not particularly hard to implement using a matplotlib.collections.EllipseCollection:

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib.collections import EllipseCollection

def plot_corr_ellipses(data, ax=None, **kwargs):

    M = np.array(data)
    if not M.ndim == 2:
        raise ValueError('data must be a 2D array')
    if ax is None:
        fig, ax = plt.subplots(1, 1, subplot_kw={'aspect':'equal'})
        ax.set_xlim(-0.5, M.shape[1] - 0.5)
        ax.set_ylim(-0.5, M.shape[0] - 0.5)

    # xy locations of each ellipse center
    xy = np.indices(M.shape)[::-1].reshape(2, -1).T

    # set the relative sizes of the major/minor axes according to the strength of
    # the positive/negative correlation
    w = np.ones_like(M).ravel()
    h = 1 - np.abs(M).ravel()
    a = 45 * np.sign(M).ravel()

    ec = EllipseCollection(widths=w, heights=h, angles=a, units='x', offsets=xy,
                           transOffset=ax.transData, array=M.ravel(), **kwargs)
    ax.add_collection(ec)

    # if data is a DataFrame, use the row/column names as tick labels
    if isinstance(data, pd.DataFrame):
        ax.set_xticks(np.arange(M.shape[1]))
        ax.set_xticklabels(data.columns, rotation=90)
        ax.set_yticks(np.arange(M.shape[0]))
        ax.set_yticklabels(data.index)

    return ec

For example, using your data:

data = df.corr()
fig, ax = plt.subplots(1, 1)
m = plot_corr_ellipses(data, ax=ax, cmap='Greens')
cb = fig.colorbar(m)
cb.set_label('Correlation coefficient')
ax.margins(0.1)

enter image description here

Negative correlations can be plotted as ellipses with the opposite orientation:

fig2, ax2 = plt.subplots(1, 1)
data2 = np.linspace(-1, 1, 9).reshape(3, 3)
m2 = plot_corr_ellipses(data2, ax=ax2, cmap='seismic', clim=[-1, 1])
cb2 = fig2.colorbar(m2)
ax2.margins(0.3)

enter image description here

like image 86
ali_m Avatar answered Oct 16 '22 15:10

ali_m


Assuming you are interested in showing cluster relations, the seaborn package mentioned in the comments also has a clustermap. Using your correlation matrix (looks like you want to show correlation coefficients as int in the [-100, 100] range, you could do the following:

corr = df.corr().mul(100).astype(int)

     GX   HG   RM   SJ   XB   XN   ZG
GX  100   77   62   71   48   66   57
HG   77  100   69   74   61   61   58
RM   62   69  100   75   48   64   68
SJ   71   74   75  100   50   70   65
XB   48   61   48   50  100   46   51
XN   66   61   64   70   46  100   75
ZG   57   58   68   65   51   75  100

and then use seaborn.clustermap() as follows:

import seaborn as sns
sns.clustermap(data=corr, annot=True, fmt='d', cmap='Greens').savefig('cluster.png')

enter image description here

like image 29
Stefan Avatar answered Oct 16 '22 17:10

Stefan