I need to calculate the area where two functions overlap. I use normal distributions in this particular simplified example, but I need a more general procedure that adapts to other functions too.
See image below to get an idea of what I mean, where the red area is what I'm after:
This is the MWE I have so far:
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
# Generate random data uniformly distributed.
a = np.random.normal(1., 0.1, 1000)
b = np.random.normal(1., 0.1, 1000)
# Obtain KDE estimates foe each set of data.
xmin, xmax = -1., 2.
x_pts = np.mgrid[xmin:xmax:1000j]
# Kernels.
ker_a = stats.gaussian_kde(a)
ker_b = stats.gaussian_kde(b)
# KDEs for plotting.
kde_a = np.reshape(ker_a(x_pts).T, x_pts.shape)
kde_b = np.reshape(ker_b(x_pts).T, x_pts.shape)
# Random sample from a KDE distribution.
sample = ker_a.resample(size=1000)
# Compute the points below which to integrate.
iso = ker_b(sample)
# Filter the sample.
insample = ker_a(sample) < iso
# As per Monte Carlo, the integral is equivalent to the
# probability of drawing a point that gets through the
# filter.
integral = insample.sum() / float(insample.shape[0])
print integral
plt.xlim(0.4,1.9)
plt.plot(x_pts, kde_a)
plt.plot(x_pts, kde_b)
plt.show()
where I apply Monte Carlo
to obtain the integral.
The problem with this method is that when I evaluate sampled points in either distribution with ker_b(sample)
(or ker_a(sample)
), I get values placed directly over the KDE line. Because of this, even clearly overlapped distributions which should return a common/overlapped area value very close to 1. return instead small values (the total area of either curve is 1. since they are probability density estimates).
How could I fix this code to give the expected results?
This is how I applied Zhenya's answer
# Calculate overlap between the two KDEs.
def y_pts(pt):
y_pt = min(ker_a(pt), ker_b(pt))
return y_pt
# Store overlap value.
overlap = quad(y_pts, -1., 2.)
The area between two curves is calculated by the formula: Area = ∫ba[f(x)−g(x)]dx ∫ a b [ f ( x ) − g ( x ) ] d x which is an absolute value of the area.
The red area on the plot is the integral of min(f(x), g(x))
, where f
and g
are your two functions, green and blue. To evaluate the integral, you can use any of the integrators from scipy.integrate
(quad
's the default one, I'd say) -- or an MC integrator, of course, but I don't quite see the point of that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With