Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Integrate 2D kernel density estimate

I have a x,y distribution of points for which I obtain the KDE through scipy.stats.gaussian_kde. This is my code and how the output looks (the x,y data can be obtained from here):

import numpy as np
from scipy import stats

# Obtain data from file.
data = np.loadtxt('data.dat', unpack=True)
m1, m2 = data[0], data[1]
xmin, xmax = min(m1), max(m1)
ymin, ymax = min(m2), max(m2)

# Perform a kernel density estimate (KDE) on the data
x, y = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
positions = np.vstack([x.ravel(), y.ravel()])
values = np.vstack([m1, m2])
kernel = stats.gaussian_kde(values)
f = np.reshape(kernel(positions).T, x.shape)

# Define the number that will determine the integration limits
x1, y1 = 2.5, 1.5

# Perform integration?

# Plot the results:
import matplotlib.pyplot as plt
# Set limits
plt.xlim(xmin,xmax)
plt.ylim(ymin,ymax)
# KDE density plot
plt.imshow(np.rot90(f), cmap=plt.cm.gist_earth_r, extent=[xmin, xmax, ymin, ymax])
# Draw contour lines
cset = plt.contour(x,y,f)
plt.clabel(cset, inline=1, fontsize=10)
plt.colorbar()
# Plot point
plt.scatter(x1, y1, c='r', s=35)
plt.show()

result

The red point with coordinates (x1, y1) has (like every point in the 2D plot) an associated value given by f (the kernel or KDE) between 0 and 0.42. Let's say that f(x1, y1) = 0.08.

I need to integrate f with integration limits in x and y given by those regions where f evaluates to less than f(x1, y1), ie: f(x, y)<0.08.

For what I've seen python can perform integration of functions and one dimensional arrays through numerical integration, but I haven't seen anything that would let me perform a numerical integration on a 2D array (the f kernel) Furthermore, I'm not sure how I would even recognize the regions given by that particular condition (ie: f(x, y)less than a given value)

Can this be done at all?

like image 905
Gabriel Avatar asked Jul 23 '13 22:07

Gabriel


People also ask

How is kernel density estimate calculated?

Kernel Density Estimation (KDE) It is estimated simply by adding the kernel values (K) from all Xj. With reference to the above table, KDE for whole data set is obtained by adding all row values. The sum is then normalized by dividing the number of data points, which is six in this example.

What is kernel density estimation?

In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on kernels as weights.

How do you plot kernel density in Excel?

First select the empty cell in your worksheet where you wish for the output table to be generated, then click on the descriptive statistics icon in anomic cell tab and select kernel density estimation from the drop down menu.

What is Gaussian_kde?

gaussian_kde(dataset, bw_method=None, weights=None)[source] Representation of a kernel-density estimate using Gaussian kernels. Kernel density estimation is a way to estimate the probability density function (PDF) of a random variable in a non-parametric way.


2 Answers

Here is a way to do it using monte carlo integration. It is a little slow, and there is randomness in the solution. The error is inversely proportional to the square root of the sample size, while the running time is directly proportional to the sample size (where sample size refers to the monte carlo sample (10000 in my example below), not the size of your data set). Here is some simple code using your kernel object.

#Compute the point below which to integrate
iso = kernel((x1,y1))

#Sample from your KDE distribution
sample = kernel.resample(size=10000)

#Filter the sample
insample = kernel(sample) < iso

#The integral you want is equivalent to the probability of drawing a point 
#that gets through the filter
integral = insample.sum() / float(insample.shape[0])
print integral

I get approximately 0.2 as the answer for your data set.

like image 122
jcrudy Avatar answered Sep 18 '22 15:09

jcrudy


Currently, it is available

kernel.integrate_box([-np.inf,-np.inf], [2.5,1.5])

like image 37
Pavel Prochazka Avatar answered Sep 18 '22 15:09

Pavel Prochazka