Calculate overlap area of two functions

Tags:

I need to calculate the area where two functions overlap. I use normal distributions in this particular simplified example, but I need a more general procedure that adapts to other functions too.

See image below to get an idea of what I mean, where the red area is what I'm after:

enter image description here

This is the MWE I have so far:

import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

# Generate random data uniformly distributed.
a = np.random.normal(1., 0.1, 1000)
b = np.random.normal(1., 0.1, 1000)

# Obtain KDE estimates foe each set of data.
xmin, xmax = -1., 2.
x_pts = np.mgrid[xmin:xmax:1000j]
# Kernels.
ker_a = stats.gaussian_kde(a)
ker_b = stats.gaussian_kde(b)
# KDEs for plotting.
kde_a = np.reshape(ker_a(x_pts).T, x_pts.shape)
kde_b = np.reshape(ker_b(x_pts).T, x_pts.shape)


# Random sample from a KDE distribution.
sample = ker_a.resample(size=1000)

# Compute the points below which to integrate.
iso = ker_b(sample)

# Filter the sample.
insample = ker_a(sample) < iso

# As per Monte Carlo, the integral is equivalent to the
# probability of drawing a point that gets through the
# filter.
integral = insample.sum() / float(insample.shape[0])

print integral

plt.xlim(0.4,1.9)
plt.plot(x_pts, kde_a)
plt.plot(x_pts, kde_b)

plt.show()

where I apply Monte Carlo to obtain the integral.

The problem with this method is that when I evaluate sampled points in either distribution with ker_b(sample) (or ker_a(sample)), I get values placed directly over the KDE line. Because of this, even clearly overlapped distributions which should return a common/overlapped area value very close to 1. return instead small values (the total area of either curve is 1. since they are probability density estimates).

How could I fix this code to give the expected results?

This is how I applied Zhenya's answer

# Calculate overlap between the two KDEs.
def y_pts(pt):
    y_pt = min(ker_a(pt), ker_b(pt))
    return y_pt
# Store overlap value.
overlap = quad(y_pts, -1., 2.)

908

asked Dec 04 '13 17:12

Gabriel

1 Answers

The red area on the plot is the integral of min(f(x), g(x)), where f and g are your two functions, green and blue. To evaluate the integral, you can use any of the integrators from scipy.integrate (quad's the default one, I'd say) -- or an MC integrator, of course, but I don't quite see the point of that.

answered Sep 17 '22 15:09

ev-br

Related questions
                            
                                Saving a numpy array with mixed data
                            
                                Python: Import file in grandparent directory
                            
                                How to catch an Exception like this on Flask?
                            
                                Push a file to Heroku that's not in my git repo.
                            
                                matplotlib contour plot with lognorm - colorbar levels
                            
                                Python recursive setattr()-like function for working with nested dictionaries
                            
                                Slice pandas series with elements not in the index
                            
                                Writing a simple function using while
                            
                                Looking to quantify the performance overhead of NewRelic monitoring in python django app
                            
                                How to sort a boxplot by the median values in pandas
                            
                                How to create a complete menu using GIO Actions in PyGI GTK?
                            
                                Keeping to 79 char line limit in Python with multiple indents
                            
                                Parsing nested JSON data
                            
                                When is a context manager's __exit__ triggered when inside a generator?
                            
                                Django allowed hosts with port number
                            
                                Django: where do I call settings.configure?
                            
                                Diff on pandas dataframe with more than one column
                            
                                Large scale server application using DDD with Python?
                            
                                NumPy PolyFit and PolyVal in Multiple Dimensions?
                            
                                Scaling issues with scipy.sparse matrix while using scikit

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Calculate overlap area of two functions

Tags:

python

numpy

scipy

montecarlo

Gabriel

People also ask

1 Answers

ev-br

Recent Activity

Donate For Us