Update: Weighted samples are now supported by <code>scipy.stats.gaussian_kde</code>. See here and here for details. It is currently not possible to use <code>scipy.stats.gaussian_kde</code> to estimate the density of a random variable based on weighted samples. What methods are available to estimate densities of continuous random variables based on weighted samples?

Neither <code>sklearn.neighbors.KernelDensity</code> nor <code>statsmodels.nonparametric</code> seem to support weighted samples. I modified <code>scipy.stats.gaussian_kde</code> to allow for heterogeneous sampling weights and thought the results might be useful for others. An example is shown below. <img src="https://i.stack.imgur.com/bc5Xv.png" alt="example"> An <code>ipython</code> notebook can be found here: http://nbviewer.ipython.org/gist/tillahoffmann/f844bce2ec264c1c8cb5 <h3>Implementation details</h3> The weighted arithmetic mean is <img src="https://i.stack.imgur.com/zZ7n2.png" alt="weighted arithmetic mean"> The unbiased data covariance matrix is then given by <img src="https://i.stack.imgur.com/zs4Cm.png" alt="unbiased covariance matrix"> The bandwidth can be chosen by <code>scott</code> or <code>silverman</code> rules as in <code>scipy</code>. However, the number of samples used to calculate the bandwidth is Kish's approximation for the effective sample size.

Weighted Gaussian kernel density estimation in `python`

Tags:

python

statistics

scipy

kernel-density

Update: Weighted samples are now supported by scipy.stats.gaussian_kde. See here and here for details.

It is currently not possible to use scipy.stats.gaussian_kde to estimate the density of a random variable based on weighted samples. What methods are available to estimate densities of continuous random variables based on weighted samples?

987

asked Dec 23 '14 16:12

Till Hoffmann

2 Answers

Neither sklearn.neighbors.KernelDensity nor statsmodels.nonparametric seem to support weighted samples. I modified scipy.stats.gaussian_kde to allow for heterogeneous sampling weights and thought the results might be useful for others. An example is shown below.

example

An ipython notebook can be found here: http://nbviewer.ipython.org/gist/tillahoffmann/f844bce2ec264c1c8cb5

Implementation details

The weighted arithmetic mean is

weighted arithmetic mean

The unbiased data covariance matrix is then given by unbiased covariance matrix

The bandwidth can be chosen by scott or silverman rules as in scipy. However, the number of samples used to calculate the bandwidth is Kish's approximation for the effective sample size.

answered Oct 28 '22 13:10

Till Hoffmann

For univariate distributions you can use KDEUnivariate from statsmodels. It is not well documented, but the fit methods accepts a weights argument. Then you cannot use FFT. Here is an example:

import matplotlib.pyplot as plt
from statsmodels.nonparametric.kde import KDEUnivariate

kde1= KDEUnivariate(np.array([10.,10.,10.,5.]))
kde1.fit(bw=0.5)
plt.plot(kde1.support, [kde1.evaluate(xi) for xi in kde1.support],'x-')

kde1= KDEUnivariate(np.array([10.,5.]))
kde1.fit(weights=np.array([3.,1.]), 
         bw=0.5,
         fft=False)
plt.plot(kde1.support, [kde1.evaluate(xi) for xi in kde1.support], 'o-')

which produces this figure: enter image description here

answered Oct 28 '22 14:10

Ramon Crehuet

Related questions
                            
                                Finding the (x,y) indexes of specific (R,G,B) color values from images stored in NumPy ndarrays
                            
                                Django and virtualenv - Adding to git repo [duplicate]
                            
                                Inconsistent use of tabs and spaces in indentation
                            
                                Faster way to loop through every pixel of an image in Python?
                            
                                If RAM isn't a concern, is reading line by line faster or reading everything into RAM and access it? - Python
                            
                                What is the recommended size of indentation in Python?
                            
                                Disabled field is considered for validation in WTForms and Flask
                            
                                What is Python's equivalent of Java's standard for-loop?
                            
                                FTP upload files Python
                            
                                How to retrieve the values of dynamic html content using Python
                            
                                How to store python dictionary in to mysql DB through python
                            
                                OpenCV-Python dense SIFT
                            
                                Multivariate kernel density estimation in Python
                            
                                Pass a 2d numpy array to c using ctypes
                            
                                Installing Python Requests
                            
                                Can't pickle <type 'instancemethod'> using python's multiprocessing Pool.apply_async()
                            
                                matplotlib.pyplot.subplots() - how to set the name of the figure?
                            
                                Pandas dataframe groupby to calculate population standard deviation
                            
                                CountVectorizer: AttributeError: 'numpy.ndarray' object has no attribute 'lower'
                            
                                What is the difference between len() and count() in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Weighted Gaussian kernel density estimation in `python`

Tags:

python

statistics

scipy

kernel-density

Till Hoffmann

People also ask

2 Answers

Implementation details

Till Hoffmann

Ramon Crehuet

Recent Activity

Donate For Us