Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how does 2d kernel density estimation in python (sklearn) work?

I am sorry for the probably stupid question but I am trying now for hours to estimate a density from a set of 2d data. Let's assume my data is given by the array: sample = np.random.uniform(0,1,size=(50,2)) . I just want to use scipys scikit learn package to estimate the density from the sample array (which is here of course a 2d uniform density) and I am trying the following:

import numpy as np
from sklearn.neighbors.kde import KernelDensity
from matplotlib import pyplot as plt
sp = 0.01

samples = np.random.uniform(0,1,size=(50,2))  # random samples
x = y = np.linspace(0,1,100)
X,Y = np.meshgrid(x,y)     # creating grid of data , to evaluate estimated density on

kde = KernelDensity(kernel='gaussian', bandwidth=0.2).fit(samples) # creating density from samples

kde.score_samples(X,Y) # I want to evaluate the estimated density on the X,Y grid

But the last step always yields the error: score_samples() takes 2 positional arguments but 3 were given

So probably .score_samples cannot take a grid as input, but there no tutorials/docs for the 2d case so I don't know how to fix this issue. It would be really great if someone could help.

like image 359
murph Avatar asked Jan 10 '17 20:01

murph


People also ask

How does kernel density estimation work?

How does a Kernel Density Estimation work? The Kernel Density Estimation works by plotting out the data and beginning to create a curve of the distribution. The curve is calculated by weighing the distance of all the points in each specific location along the distribution.

How is kernel density estimation calculated?

Kernel Density Estimation (KDE) It is estimated simply by adding the kernel values (K) from all Xj. With reference to the above table, KDE for whole data set is obtained by adding all row values. The sum is then normalized by dividing the number of data points, which is six in this example.

What is kernel density estimation it is the implementation?

In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on kernels as weights.


1 Answers

Looking at the Kernel Density Estimate of Species Distributions example, you have to package the x,y data together (both the training data and the new sample grid).

Below is a function that simplifies the sklearn API.

from sklearn.neighbors import KernelDensity

def kde2D(x, y, bandwidth, xbins=100j, ybins=100j, **kwargs): 
    """Build 2D kernel density estimate (KDE)."""

    # create grid of sample locations (default: 100x100)
    xx, yy = np.mgrid[x.min():x.max():xbins, 
                      y.min():y.max():ybins]

    xy_sample = np.vstack([yy.ravel(), xx.ravel()]).T
    xy_train  = np.vstack([y, x]).T

    kde_skl = KernelDensity(bandwidth=bandwidth, **kwargs)
    kde_skl.fit(xy_train)

    # score_samples() returns the log-likelihood of the samples
    z = np.exp(kde_skl.score_samples(xy_sample))
    return xx, yy, np.reshape(z, xx.shape)

This gives you the xx, yy, zz needed for something like a scatter or pcolormesh plot. I've copied the example from the scipy page on the gaussian_kde function.

import numpy as np
import matplotlib.pyplot as plt

m1 = np.random.normal(size=1000)
m2 = np.random.normal(scale=0.5, size=1000)

x, y = m1 + m2, m1 - m2

xx, yy, zz = kde2D(x, y, 1.0)

plt.pcolormesh(xx, yy, zz)
plt.scatter(x, y, s=2, facecolor='white')

example figure

like image 159
Geoff Avatar answered Oct 17 '22 03:10

Geoff