Given a probability distribution with unknown functional form (example below), I like to plot "percentile-based" contour lines, i.e.,those that correspond to regions with an integral of 10%, 20%, ..., 90% etc.
## example of an "arbitrary" probability distribution ##
from matplotlib.mlab import bivariate_normal
import matplotlib.pyplot as plt
import numpy as np
X, Y = np.mgrid[-3:3:100j, -3:3:100j]
z1 = bivariate_normal(X, Y, .5, .5, 0., 0.)
z2 = bivariate_normal(X, Y, .4, .4, .5, .5)
z3 = bivariate_normal(X, Y, .6, .2, -1.5, 0.)
z = z1+z2+z3
plt.imshow(np.reshape(z.T, (100,-1)), origin='lower', extent=[-3,3,-3,3])
plt.show()
I've looked into multiple approaches, from using the default contour function in matplotlib, methods involving stats.gaussian_kde in scipy, and even perhaps generating random point samples from the distribution and estimating a kernel afterwards. None of them appears to provide the solution.
Look at the integral of p(x) inside the contour p(x) ≥ t and solve for the desired value of t:
import matplotlib
from matplotlib.mlab import bivariate_normal
import matplotlib.pyplot as plt
import numpy as np
X, Y = np.mgrid[-3:3:100j, -3:3:100j]
z1 = bivariate_normal(X, Y, .5, .5, 0., 0.)
z2 = bivariate_normal(X, Y, .4, .4, .5, .5)
z3 = bivariate_normal(X, Y, .6, .2, -1.5, 0.)
z = z1 + z2 + z3
z = z / z.sum()
n = 1000
t = np.linspace(0, z.max(), n)
integral = ((z >= t[:, None, None]) * z).sum(axis=(1,2))
from scipy import interpolate
f = interpolate.interp1d(integral, t)
t_contours = f(np.array([0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1]))
plt.imshow(z.T, origin='lower', extent=[-3,3,-3,3], cmap="gray")
plt.contour(z.T, t_contours, extent=[-3,3,-3,3])
plt.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With