I am trying to write a function to produce Matlab style correlation plots using matplotlib in Python 3.4 (example here). However, I want to change the plot so that the diagonal subplots display the name of the variable, the lower triangle subplots display the Pearson correlation coefficient, and the upper triangle subplots display a scatter plot. Below is some code to generate sample data and the function I wrote. It displays the appropriate 4x4 grid of subplots with variable names and correlation coefficients in the correct place, but the scatter plots do not show up.
import numpy as np
import matplotlib.pyplot as plt
means = [0, 1, 0, 2]
sig = [[1, 0.5, 0, -0.1], [0.5, 3, 0, 0.2], [0, -0.1, 1, -0.3], [-0.1, 0.2, -0.3, 1]]
data = np.random.multivariate_normal(means, sig, 50)
names = ['Var' + str(i) for i in range(data.shape[1])]
def corrplot(data, names):
corrMat = np.corrcoef(data, rowvar = 0)
numVars = data.shape[1]
fig, ax = plt.subplots(numVars, numVars, sharex = "col", sharey = "row")
fig.subplots_adjust(wspace = 0, hspace = 0)
for i in range(numVars):
for j in range(numVars):
if i == j: # On the diagonal
ax[i, j].text(0.5, 0.5, names[i], transform = ax[i, j].transAxes)
elif i < j: # In the upper triangle
ax[i, j].scatter(data[:, i], data[:, j], marker = '.')
elif i > j: # In the lower triangle
ax[i, j].text(0.5, 0.5, str(round(corrMat[i, j], 3)), transform = ax[i, j].transAxes)
plt.show()
In an attempt to identify the source of the problem, I manually reconstructed the plot for a 2 variable case using the following code, which produces the desired plot:
fig, ax = plt.subplots(2, 2, sharex = "col", sharey = "row")
fig.subplots_adjust(wspace = 0, hspace = 0)
ax[0, 0].text(0.5, 0.5, 'Var0', transform = ax[0, 0].transAxes)
ax[0, 1].scatter(data[:, i], data[:, j], marker = '.')
ax[1, 0].text(0.5, 0.5, '0.5', transform = ax[1, 0].transAxes)
ax[1, 1].text(0.5, 0.5, 'Var1', transform = ax[1, 1].transAxes)
plt.show()
Since this works, I hypothesized that the problem had nothing to do with mixing text and data in the subplots. I wrote the next function to test populating the subplots using a for loop, and it produces a scatter plot in each subplot as expected.
def test1(data):
numVars = data.shape[1]
fig, ax = plt.subplots(numVars, numVars, sharex = "col", sharey = "row")
fig.subplots_adjust(wspace = 0, hspace = 0)
for i in range(numVars):
for j in range(numVars):
ax[i, j].scatter(data[:, i], data[:, j], marker = '.')
plt.show()
Next, I tried to populate only a subset of the subplots using for loops. This produces a blank grid as follows.
def test2(data):
numVars = data.shape[1]
fig, ax = plt.subplots(numVars, numVars, sharex = "col", sharey = "row")
fig.subplots_adjust(wspace = 0, hspace = 0)
for i in range(numVars):
for j in range(i + 1, numVars):
ax[i, j].scatter(data[:, i], data[:, j], marker = '.')
plt.show()
This leads me to believe that there is some error related to the for loops and how the scatter plots are being created, but I haven't been able to find the error yet.
Your code shows exactly the desired plot.
I think your version of matplolib
didn’t recognize the marker = '.'
You could try to plot with default marker (without marker = '.'
) or replace it with marker = 'o'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With