I am interested in using a sample of points to construct a KDE, and then using that KDE to resample points. scipy.stats.gaussian_kde
offers a very simple way to do this. For example, sampling from a gaussian distribution:
import numpy as np
from scipy.stats import gaussian_kde, norm
sampled = np.random.normal(loc = 0, scale = 1, size = 1000)
kde = gaussian_kde(sampled, bw_method = 'silverman')
resampled = kde.resample(1000)
One flaw with scipy.stats.gaussian_kde
is that it offers limited choices for bandwidth selection. Reading through this, I was pointed to statsmodels.nonparametric.kernel_density.KDEMultivariate
(more information here). This lets me use cross validation to estimate the optimal bandwidth, which is more sophisticated if the underlying pdf you are trying to approximate is not unimodal. For example, using a sum of two gaussians, I can construct a KDE using KDEMultivariate
as follows:
from statsmodels.nonparametric.kernel_density import KDEMultivariate
sampled = np.concatenate((np.random.normal(loc = -3, scale = 1, size = 1000), \
np.random.normal(loc = 3, scale = 1, size = 1000)))
kde = KDEMultivariate(sampled, 'c', bw = 'cv_ml')
Exploring higher dimensional data with an arbitrary underlying pdf, it is clear that KDEMultivariate
is able to produce a PDF that is much more representative of the original PDF. But there is a big issue I am running into -- KDEMultivariate
has no kde.resample()
method, and I am therefore unable to resample points from my new KDE. Is there a simple and efficient way to resample from a KDE constructed using statsmodels.nonparametric.kernel_density.KDEMultivariate
?
Using motivation from gaussian_kde
in scipy
, I wrote a simple resampling procedure. In statsmodels
, the bandwidth corresponds to the SD of the gaussian kernels in each dimension. In scipy, the bandwidth^2 is multiplied by the data covariance to construct the covariance matrix.
def resample(kde, size):
n, d = kde.data.shape
indices = np.random.randint(0, n, size)
cov = np.diag(kde.bw)**2
means = kde.data[indices, :]
norm = np.random.multivariate_normal(np.zeros(d), cov, size)
return np.transpose(means + norm)
This takes an instance of KDEMultivariate
, and resamples it by selecting random kernels, then sampling from those kernels using the fact that each kernel follows a multivariate normal distribution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With