Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

calculate histogram peaks in python

Tags:

python

In Python, how do I calcuate the peaks of a histogram?

I tried this:

import numpy as np
from scipy.signal import argrelextrema

data = [0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 1, 2, 3, 4,

        5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9,

        12,

        15, 16, 17, 18, 19, 15, 16, 17, 18, 

        19, 20, 21, 22, 23, 24,]

h = np.histogram(data, bins=[0, 5, 10, 15, 20, 25])
hData = h[0]
peaks = argrelextrema(hData, np.greater)

But the result was:

(array([3]),)

I'd expect it to find the peaks in bin 0 and bin 3.

Note that the peaks span more than 1 bin. I don't want it to consider the peaks that span more than 1 column as additional peak.

I'm open to another way to get the peaks.

Note:

>>> h[0]
array([19, 15,  1, 10,  5])
>>> 
like image 250
brian Avatar asked Aug 10 '15 01:08

brian


People also ask

How do you find the peak value in Python?

Practical Data Science using Python Suppose we have an input array nums, where nums[i] ≠ nums[i+1], search for a peak element and return its index. The array can hold multiple peak elements, in that case return the index to any one of the peak elements. We can imagine that nums[-1] = nums[n] = -∞.

How do you get data from a histogram in Python?

The plt() function present in pyplot submodule of Matplotlib takes the array of dataset and array of bin as parameter and creates a histogram of the corresponding data values.

What are the values returned by NP histogram ()?

The Numpy histogram function doesn't draw the histogram, but it computes the occurrences of input data that fall within each bin, which in turns determines the area (not necessarily the height if the bins aren't of equal width) of each bar. There are 3 bins, for values ranging from 0 to 1 (excl 1.), 1 to 2 (excl.


2 Answers

In computational topology, the formalism of persistent homology provides a definition of "peak" that seems to address your need. In the 1-dimensional case the peaks are illustrated by the blue bars in the following figure:

Most persistent peaks

A description of the algorithm is given in this Stack Overflow answer of a peak detection question.

The nice thing is that this method not only identifies the peaks but it quantifies the "significance" in a natural way.

A simple and efficient implementation (as fast as sorting numbers) and the source material to the above answer given in this blog article: https://www.sthu.org/blog/13-perstopology-peakdetection/index.html

like image 87
S. Huber Avatar answered Oct 07 '22 14:10

S. Huber


I wrote an easy function:

def find_peaks(a):
  x = np.array(a)
  max = np.max(x)
  lenght = len(a)
  ret = []
  for i in range(lenght):
      ispeak = True
      if i-1 > 0:
          ispeak &= (x[i] > 1.8 * x[i-1])
      if i+1 < lenght:
          ispeak &= (x[i] > 1.8 * x[i+1])

      ispeak &= (x[i] > 0.05 * max)
      if ispeak:
          ret.append(i)
  return ret

I defined a peak as a value bigger than 180% that of the neighbors and bigger than 5% of the max value. Of course you can adapt the values as you prefer in order to find the best set up for your problem.

like image 43
Andrea Mauro Avatar answered Oct 07 '22 13:10

Andrea Mauro