Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract histogram modes by detecting the local maxima of a vector with NumPy/SciPy

Is there a way with NumPy/SciPy` to keep only the histogram modes when extracting the local maxima (shown as blue dots on the image below)?:
Histogram

These maxima were extracted using scipy.signal.argrelmax, but I only need to get the two modes values and ignore the rest of the maxima detected:

# calculate dB positive image
img_db = 10 * np.log10(img)
img_db_pos = img_db + abs(np.min(img_db))
data = img_db_pos.flatten() + 1

# data histogram
n, bins = np.histogram(data, 100, normed=True)

# trim data
x = np.linspace(np.min(data), np.max(data), num=100)

# find index of minimum between two modes
ind_max = argrelmax(n)
x_max = x[ind_max]
y_max = n[ind_max]

# plot
plt.hist(data, bins=100, normed=True, color='y')
plt.scatter(x_max, y_max, color='b')
plt.show()

Note:

I've managed to use this Smoothing filter to get a curve that matches the histogram (but I don't have the equation of the curve).

like image 261
Hakim Avatar asked Mar 22 '17 12:03

Hakim


People also ask

How many bins to use for the histogram in SciPy?

This is documentation for an old release of SciPy (version 0.14.0). Search for this page in the documentation of the latest stable release (version 1.7.1). Separates the range into several bins and returns the number of instances in each bin. Array of scores which will be put into bins. The number of bins to use for the histogram. Default is 10.

How to convert NumPy histogram to graphical in Matplotlib?

The above numeric representation of histogram can be converted into a graphical form.The plt () function present in pyplot submodule of Matplotlib takes the array of dataset and array of bin as parameter and creates a histogram of the corresponding data values. Writing code in comment?

How to find local maxima or minima with NumPy in Python?

To find local maxima or minima with Numpy in a 1D Numpy array with Python, we can use the argrelextrema methods. Then we find the local maxima of array x by calling argrelextrema with x and np.greater. Likewise, we find the local minima of array x by calling argrelextrema with x and np.less.

What is the default range of a histogram in Python?

The number of bins to use for the histogram. Default is 10. defaultlimits : tuple (lower, upper), optional The lower and upper values for the range of the histogram. If no value is given, a range slightly larger then the range of the values in a is used. Specifically (a.min () - s, a.max () + s),


2 Answers

This could be achieved by appropriately adjusting parameter order of function argrelmax, i.e. by adjusting how many points on each side are considered to detect local maxima.

I've used this code to create mock data (you can play around with the values of the different variables to figure out their effect on the generated histogram):

import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import argrelextrema, argrelmax

m = 50
s = 10
samples = 50000
peak_start = 30
peak_width = 10
peak_gain = 4

np.random.seed(3)
data = np.random.normal(loc=m, scale=s, size=samples)
bell, edges = np.histogram(data, bins=np.arange(2*(m + 1) - .5), normed=True)
x = np.int_(.5*(edges[:-1] + edges[1:]))

bell[peak_start + np.arange(peak_width)] *= np.linspace(1, peak_gain, peak_width)

plt.bar(x, bell)
plt.show()

mock data

As shown below, it is important to carefully select the value of order. Indeed, if order is too small you are likely to detect noisy local maxima, whereas if order is too large you might fail to detect some of the modes.

In [185]: argrelmax(bell, order=1)
Out[185]: (array([ 3,  5,  7, 12, 14, 39, 47, 51, 86, 90], dtype=int64),)

In [186]: argrelmax(bell, order=2)
Out[186]: (array([14, 39, 47, 51, 90], dtype=int64),)

In [187]: argrelmax(bell, order=3)
Out[187]: (array([39, 47, 51], dtype=int64),)

In [188]: argrelmax(bell, order=4)
Out[188]: (array([39, 51], dtype=int64),)

In [189]: argrelmax(bell, order=5)
Out[189]: (array([39, 51], dtype=int64),)   

In [190]: argrelmax(bell, order=11)
Out[190]: (array([39, 51], dtype=int64),)

In [191]: argrelmax(bell, order=12)
Out[191]: (array([39], dtype=int64),)

These results are strongly dependent on the shape of the histogram (if you change just one of the parameters used to generate data the range of valid values for order may vary). To make mode detection more robust, I would recommend you to pass a smoothed histogram to argrelmax rather than the original histogram.

like image 192
Tonechas Avatar answered Sep 28 '22 13:09

Tonechas


I guess, you want to find second largest number in y_max. Hope this example will help you:

np.random.seed(4)  # for reproducibility
data = np.zeros(0)
for i in xrange(10):
    data = np.hstack(( data, np.random.normal(i, 0.25, 100*i) ))

# data histogram
n, bins = np.histogram(data, 100, normed=True)

# trim data
x = np.linspace(np.min(data), np.max(data), num=100)

# find index of minimum between two modes
ind_max = argrelmax(n)
x_max = x[ind_max]
y_max = n[ind_max]

# find first and second max values in y_max
index_first_max = np.argmax(y_max)
maximum_y = y_max[index_first_max]
second_max_y = max(n for n in y_max if n!=maximum_y)
index_second_max = np.where(y_max == second_max_y)

# plot
plt.hist(data, bins=100, normed=True, color='y')
plt.scatter(x_max, y_max, color='b')
plt.scatter(x_max[index_first_max], y_max[index_first_max], color='r')
plt.scatter(x_max[index_second_max], y_max[index_second_max], color='g')
plt.show()

enter image description here

like image 32
enclis Avatar answered Sep 28 '22 13:09

enclis