Resampling a KDE (Kernel density estimation) in statsmodels

Question

I am interested in using a sample of points to construct a KDE, and then using that KDE to resample points. scipy.stats.gaussian_kde offers a very simple way to do this. For example, sampling from a gaussian distribution:

import numpy as np
from scipy.stats import gaussian_kde, norm

sampled = np.random.normal(loc = 0, scale = 1, size = 1000)
kde = gaussian_kde(sampled, bw_method = 'silverman')
resampled = kde.resample(1000)

One flaw with scipy.stats.gaussian_kde is that it offers limited choices for bandwidth selection. Reading through this, I was pointed to statsmodels.nonparametric.kernel_density.KDEMultivariate (more information here). This lets me use cross validation to estimate the optimal bandwidth, which is more sophisticated if the underlying pdf you are trying to approximate is not unimodal. For example, using a sum of two gaussians, I can construct a KDE using KDEMultivariate as follows:

from statsmodels.nonparametric.kernel_density import KDEMultivariate
sampled = np.concatenate((np.random.normal(loc = -3, scale = 1, size = 1000), \
                          np.random.normal(loc = 3, scale = 1, size = 1000)))
kde = KDEMultivariate(sampled, 'c', bw = 'cv_ml')

Exploring higher dimensional data with an arbitrary underlying pdf, it is clear that KDEMultivariate is able to produce a PDF that is much more representative of the original PDF. But there is a big issue I am running into -- KDEMultivariate has no kde.resample() method, and I am therefore unable to resample points from my new KDE. Is there a simple and efficient way to resample from a KDE constructed using statsmodels.nonparametric.kernel_density.KDEMultivariate?

Peter · Accepted Answer

Using motivation from gaussian_kde in scipy, I wrote a simple resampling procedure. In statsmodels, the bandwidth corresponds to the SD of the gaussian kernels in each dimension. In scipy, the bandwidth^2 is multiplied by the data covariance to construct the covariance matrix.

def resample(kde, size):
    n, d = kde.data.shape
    indices = np.random.randint(0, n, size)
    cov = np.diag(kde.bw)**2
    means = kde.data[indices, :]
    norm = np.random.multivariate_normal(np.zeros(d), cov, size)
    return np.transpose(means + norm)

This takes an instance of KDEMultivariate, and resamples it by selecting random kernels, then sampling from those kernels using the fact that each kernel follows a multivariate normal distribution.

Resampling a KDE (Kernel density estimation) in statsmodels

Tags:

numpy

scipy

statsmodels

Enrique Welsh

1 Answers

Peter

Recent Activity

Donate For Us

Resampling a KDE (Kernel density estimation) in statsmodels

Tags:

numpy

scipy

statsmodels

Enrique Welsh

1 Answers

Peter

Related questions

Recent Activity

Donate For Us