Extract histogram modes by detecting the local maxima of a vector with NumPy/SciPy

Tags:

Is there a way with NumPy/SciPy` to keep only the histogram modes when extracting the local maxima (shown as blue dots on the image below)?:

These maxima were extracted using scipy.signal.argrelmax, but I only need to get the two modes values and ignore the rest of the maxima detected:

# calculate dB positive image
img_db = 10 * np.log10(img)
img_db_pos = img_db + abs(np.min(img_db))
data = img_db_pos.flatten() + 1

# data histogram
n, bins = np.histogram(data, 100, normed=True)

# trim data
x = np.linspace(np.min(data), np.max(data), num=100)

# find index of minimum between two modes
ind_max = argrelmax(n)
x_max = x[ind_max]
y_max = n[ind_max]

# plot
plt.hist(data, bins=100, normed=True, color='y')
plt.scatter(x_max, y_max, color='b')
plt.show()

Note:

I've managed to use this Smoothing filter to get a curve that matches the histogram (but I don't have the equation of the curve).

261

asked Mar 22 '17 12:03

Hakim

2 Answers

This could be achieved by appropriately adjusting parameter order of function argrelmax, i.e. by adjusting how many points on each side are considered to detect local maxima.

I've used this code to create mock data (you can play around with the values of the different variables to figure out their effect on the generated histogram):

import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import argrelextrema, argrelmax

m = 50
s = 10
samples = 50000
peak_start = 30
peak_width = 10
peak_gain = 4

np.random.seed(3)
data = np.random.normal(loc=m, scale=s, size=samples)
bell, edges = np.histogram(data, bins=np.arange(2*(m + 1) - .5), normed=True)
x = np.int_(.5*(edges[:-1] + edges[1:]))

bell[peak_start + np.arange(peak_width)] *= np.linspace(1, peak_gain, peak_width)

plt.bar(x, bell)
plt.show()

mock data

As shown below, it is important to carefully select the value of order. Indeed, if order is too small you are likely to detect noisy local maxima, whereas if order is too large you might fail to detect some of the modes.

In [185]: argrelmax(bell, order=1)
Out[185]: (array([ 3,  5,  7, 12, 14, 39, 47, 51, 86, 90], dtype=int64),)

In [186]: argrelmax(bell, order=2)
Out[186]: (array([14, 39, 47, 51, 90], dtype=int64),)

In [187]: argrelmax(bell, order=3)
Out[187]: (array([39, 47, 51], dtype=int64),)

In [188]: argrelmax(bell, order=4)
Out[188]: (array([39, 51], dtype=int64),)

In [189]: argrelmax(bell, order=5)
Out[189]: (array([39, 51], dtype=int64),)   

In [190]: argrelmax(bell, order=11)
Out[190]: (array([39, 51], dtype=int64),)

In [191]: argrelmax(bell, order=12)
Out[191]: (array([39], dtype=int64),)

These results are strongly dependent on the shape of the histogram (if you change just one of the parameters used to generate data the range of valid values for order may vary). To make mode detection more robust, I would recommend you to pass a smoothed histogram to argrelmax rather than the original histogram.

192

answered Sep 28 '22 13:09

Tonechas

I guess, you want to find second largest number in y_max. Hope this example will help you:

np.random.seed(4)  # for reproducibility
data = np.zeros(0)
for i in xrange(10):
    data = np.hstack(( data, np.random.normal(i, 0.25, 100*i) ))

# data histogram
n, bins = np.histogram(data, 100, normed=True)

# trim data
x = np.linspace(np.min(data), np.max(data), num=100)

# find index of minimum between two modes
ind_max = argrelmax(n)
x_max = x[ind_max]
y_max = n[ind_max]

# find first and second max values in y_max
index_first_max = np.argmax(y_max)
maximum_y = y_max[index_first_max]
second_max_y = max(n for n in y_max if n!=maximum_y)
index_second_max = np.where(y_max == second_max_y)

# plot
plt.hist(data, bins=100, normed=True, color='y')
plt.scatter(x_max, y_max, color='b')
plt.scatter(x_max[index_first_max], y_max[index_first_max], color='r')
plt.scatter(x_max[index_second_max], y_max[index_second_max], color='g')
plt.show()

enter image description here

answered Sep 28 '22 13:09

enclis

Related questions
                            
                                Intersection of multiple pandas dataframes
                            
                                Python/sklearn - preprocessing.MinMaxScaler 1d deprecation
                            
                                error when "sudo pip" on an AWS ec2 instance
                            
                                User Input Validation in PyQt5 and Python
                            
                                Counting recursive calls of a function
                            
                                Unable to run the following code using python bindings for vlc
                            
                                Pop up message box in PyGame
                            
                                Pandas : Use groupby on each element of list
                            
                                Reinstall python 2.7.12 and python 3.5.2
                            
                                Add PySpark RDD as new column to pyspark.sql.dataframe
                            
                                Python Count the number of substring in list from other string list without duplicates
                            
                                Model help using Scikit-learn when using GridSearch
                            
                                How to connect to Redis running in Docker container via CLI
                            
                                Can I use xor on numpy matrices?
                            
                                Equivalent of python's encode('utf8') in golang
                            
                                How to iterate through a list of tuples containing three pair values?
                            
                                Limit on the number of values returned by a Python function/method
                            
                                Renaming columns using numbers from a range in python/pandas
                            
                                Select a List Slices of a Pandas Multiindex/Multicolumn DataFrame
                            
                                Numpy/Python Array Value error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Extract histogram modes by detecting the local maxima of a vector with NumPy/SciPy

Tags:

python

numpy

scipy

Hakim

People also ask

2 Answers

Tonechas

enclis

Recent Activity

Donate For Us