How to find Local maxima in Kernel Density Estimation?

Tags:

I'm trying to make a filter (to remove outlier and noise) using kernel density estimators(KDE). I applied KDE in my 3D (d=3) data points and that gives me the probability density function (PDF) f(x). Now as we know local maxima of density estimation f(x) defined the centers of the clusters of data points. So my idea is to define the appropriate f(x) which will determine those clusters.

My question is how and what method will be better suited for this purpose of finding local maxima in f(x). If anyone can provide me some example code/ idea I will really appreciate it.

Here is the code to find the KDE which give f(x) in 3D data.

import numpy as np
from scipy import stats

data = np.array([[1, 4, 3], [2, .6, 1.2], [2, 1, 1.2],
         [2, 0.5, 1.4], [5, .5, 0], [0, 0, 0],
         [1, 4, 3], [5, .5, 0], [2, .5, 1.2]])
data = data.T 
kde = stats.gaussian_kde(data)
minima = data.T.min(axis=0)
maxima = data.T.max(axis=0)
space = [np.linspace(mini,maxi,20) for mini, maxi in zip(minima,maxima)]
grid = np.meshgrid(*space)
coords = np.vstack(map(np.ravel, grid))
#Evaluate the KD estimated pdf at each coordinate
density = kde(coords)

481

asked Jul 03 '15 03:07

jquery404

3 Answers

You will want to use an algorithm called Mean Shift. Its a clustering algorithm that works by finding the modes (aka maxima of f(x)) of the KDE. Note that the bandwidth set for your KDE will impact the number of modes and their locations. Since you are using python, there is an implementation in scikit-learn.

174

answered Oct 19 '22 16:10

Raff.Edward

Here is a short function that demonstrates how you could estimate the maxima. Note: the higher the number of no_samples the more accurate the maxima.

from scipy.stats import gaussian_kde
import numpy as np

 def estimate_maxima(data):
    kde = gaussian_kde(data)
    no_samples = 10
    samples = np.linspace(min(data), max(data), no_samples)
    probs = kde.evaluate(samples)
    maxima_index = probs.argmax()
    maxima = samples[maxima_index]
    
    return maxima

answered Oct 19 '22 16:10

brandon musa

You could use scipy.optimize.

Example on 1D-data:

import numpy as np
from scipy import optimize
from scipy import stats


# Generate some random data
shape, loc, scale = .5, 3, 10
n = 1000
data = np.sort(stats.lognorm.rvs(shape, loc, scale, size=n))

kernel = stats.gaussian_kde(data)
# Minimize the negative instead of maximizing
# Depending on the shape of your data, you might want to set some bounds
opt = optimize.minimize_scalar(lambda x: -kernel(x))
opt

     fun: array([-0.08363781])
    nfev: 21
     nit: 14
 success: True
       x: array([10.77361776])

The actual mode of this distribution is at

mode = scale/np.exp(shape**2) + loc
mode
10.788007830714049

Plotting the results:

import matplotlib.pyplot as plt

data_es = np.linspace(0, data.max(), 201)  # x-axis points
ecdf = (np.arange(n) + 1)/n  # empirical CDF

fig, axes = plt.subplots(2, 1, sharex=True, dpi=300, figsize=(6,7))
axes[0].hist(x, bins=30, density=True, alpha=.5, rwidth=.9)  # histogram
axes[0].plot(data_es, kernel.pdf(data_es), 'C0')  # estimated PDF
axes[0].plot(data_es, stats.lognorm.pdf(data_es, shape, loc, scale), 'k--', alpha=.5)  # true PDF
axes[0].plot(opt.x, kernel.pdf(opt.x), 'C0.')  # estimated mode
axes[0].plot(mode, stats.lognorm.pdf(mode, shape, loc, scale), 'k.', alpha=.5)  # true mode

axes[1].plot(np.sort(data), ecdf)  # estimated CDF
axes[1].plot(opt.x, np.interp(opt.x, np.sort(data), ecdf), 'C0.')  #estimated mode
axes[1].plot(data_es, stats.lognorm.cdf(data_es, shape, loc, scale), 'k--', alpha=.5)  # true CDF
axes[1].plot(mode, stats.lognorm.cdf(mode, shape, loc, scale), 'k.', alpha=.5)  # true mode

fig.tight_layout()

probability distribution

As you can see, the estimated mode fits pretty well. I assume it can be expanded to multi-variate data with other methods from scipy.optimize.

answered Oct 19 '22 14:10

Aubergine

Related questions
                            
                                Why is my Sieve of Eratosthenes so slow?
                            
                                Extract values from a list using an array with boolean expressions
                            
                                Is there any function like iconv in Python?
                            
                                Correct way of loading JSON from file into a Python dictionary
                            
                                Hausdorff distance between 3D grids
                            
                                Script with scipy using py2exe
                            
                                Python Scapy vs dpkt
                            
                                How to make a scrolling menu in python-curses
                            
                                How to add capital to django-cities-light country model?
                            
                                Using subprocess.check_output for a command with 2>/dev/null
                            
                                Pandas/Python Combine two data frames with duplicate rows
                            
                                How to solve import error for pandas using iPython Notebook on Windows?
                            
                                How can I evaluate a list of strings as a list of tuples in Python?
                            
                                Newick tree representation to scipy.cluster.hierarchy linkage matrix format
                            
                                saving a dataframe to JSON file on local drive in pyspark
                            
                                Set dynamic node shape in network with matplotlib
                            
                                Unsupported format character?
                            
                                Python : Reading Large Excel Worksheets using Openpyxl
                            
                                Using Selenium on Raspberry Pi with Chromium
                            
                                Swap R and B color channel values in a directory of images? Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to find Local maxima in Kernel Density Estimation?

Tags:

python

machine-learning

cluster-analysis

kernel-density