Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast peak-finding and centroiding in python

I am trying to develop a fast algorithm in python for finding peaks in an image and then finding the centroid of those peaks. I have written the following code using the scipy.ndimage.label and ndimage.find_objects for locating the objects. This seems to be the bottleneck in the code, and it takes about 7 ms to locate 20 objects in a 500x500 image. I would like to scale this up to larger (2000x2000) image, but then the time increases to almost 100 ms. So, I'm wondering if there is a faster option.

Here is the code that I have so far, which works, but is slow. First I simulate my data using some gaussian peaks. This part is slow, but in practice I will be using real data, so I don't care too much about speeding that part up. I would like to be able to find the peaks very quickly.

import time
import numpy as np
import matplotlib.pyplot as plt
import scipy.ndimage
import matplotlib.patches 

plt.figure(figsize=(10,10))
ax1 = plt.subplot(221)
ax2 = plt.subplot(222)
ax3 = plt.subplot(223)
ax4 = plt.subplot(224)

size        = 500 #width and height of image in pixels
peak_height = 100 # define the height of the peaks
num_peaks   = 20
noise_level = 50
threshold   = 60

np.random.seed(3)

#set up a simple, blank image (Z)
x = np.linspace(0,size,size)
y = np.linspace(0,size,size)

X,Y = np.meshgrid(x,y)
Z = X*0

#now add some peaks
def gaussian(X,Y,xo,yo,amp=100,sigmax=4,sigmay=4):
    return amp*np.exp(-(X-xo)**2/(2*sigmax**2) - (Y-yo)**2/(2*sigmay**2))

for xo,yo in size*np.random.rand(num_peaks,2):
    widthx = 5 + np.random.randn(1)
    widthy = 5 + np.random.randn(1)
    Z += gaussian(X,Y,xo,yo,amp=peak_height,sigmax=widthx,sigmay=widthy)

#of course, add some noise:
Z = Z + scipy.ndimage.gaussian_filter(0.5*noise_level*np.random.rand(size,size),sigma=5)    
Z = Z + scipy.ndimage.gaussian_filter(0.5*noise_level*np.random.rand(size,size),sigma=1)    

t = time.time() #Start timing the peak-finding algorithm

#Set everything below the threshold to zero:
Z_thresh = np.copy(Z)
Z_thresh[Z_thresh<threshold] = 0
print 'Time after thresholding: %.5f seconds'%(time.time()-t)

#now find the objects
labeled_image, number_of_objects = scipy.ndimage.label(Z_thresh)
print 'Time after labeling: %.5f seconds'%(time.time()-t)

peak_slices = scipy.ndimage.find_objects(labeled_image)
print 'Time after finding objects: %.5f seconds'%(time.time()-t)

def centroid(data):
    h,w = np.shape(data)   
    x = np.arange(0,w)
    y = np.arange(0,h)

    X,Y = np.meshgrid(x,y)

    cx = np.sum(X*data)/np.sum(data)
    cy = np.sum(Y*data)/np.sum(data)

    return cx,cy

centroids = []

for peak_slice in peak_slices:
    dy,dx  = peak_slice
    x,y = dx.start, dy.start
    cx,cy = centroid(Z_thresh[peak_slice])
    centroids.append((x+cx,y+cy))

print 'Total time: %.5f seconds\n'%(time.time()-t)

###########################################
#Now make the plots:
for ax in (ax1,ax2,ax3,ax4): ax.clear()
ax1.set_title('Original image')
ax1.imshow(Z,origin='lower')

ax2.set_title('Thresholded image')
ax2.imshow(Z_thresh,origin='lower')

ax3.set_title('Labeled image')
ax3.imshow(labeled_image,origin='lower') #display the color-coded regions

for peak_slice in peak_slices:  #Draw some rectangles around the objects
    dy,dx  = peak_slice
    xy     = (dx.start, dy.start)
    width  = (dx.stop - dx.start + 1)
    height = (dy.stop - dy.start + 1)
    rect = matplotlib.patches.Rectangle(xy,width,height,fc='none',ec='red')
    ax3.add_patch(rect,)

ax4.set_title('Centroids on original image')
ax4.imshow(Z,origin='lower')

for x,y in centroids:
    ax4.plot(x,y,'kx',ms=10)

ax4.set_xlim(0,size)
ax4.set_ylim(0,size)

plt.tight_layout
plt.show()

The results for size=500: enter image description here

EDIT: If the number of peaks is large (~100) and the size of the image is small, then the bottleneck is actually the centroiding part. So, perhaps the speed of this part also needs to be optimized.

like image 870
DanHickstein Avatar asked Oct 01 '13 17:10

DanHickstein


3 Answers

Your method for finding the peaks (simple thresholding) is of course very sensitive to the choice of threshold: set it too low and you'll "detect" things that are not peaks; set it too high and you'll miss valid peaks.

There are more robust alternatives, that will detect all the local maxima in the image intensity regardless of their intensity value. My preferred one is applying a dilation with a small (5x5 or 7x7) structuring element, then find the pixels where the original image and its dilated version have the same value. This works because, by definition, dilation(x, y, E, img) = { max of img within E centered at pixel (x,y) }, and therefore dilation(x, y, E, img) = img(x, y) whenever (x,y) is the location of a local maximum at the scale of E.

With a fast implementation of the morphological operators (e.g. the one in OpenCV) this algorithm is linear in the size of the image in both space and time (one extra image-sized buffer for the dilated image, and one pass on both). In a pinch, it can also be implemented on-line without the extra buffer and a little extra complexity, and it's still linear time.

To further robustify it in the presence of salt-and-pepper or similar noise, which may introduce many false maxima, you can apply the method twice, with structuring elements of different size (say, 5x5 and 7x7), then retain only the stable maxima, where stability can be defined by unchanging position of the maxima, or by position not changing by more than one pixel, etc. Additionally, you may want to suppress low nearby maxima when you have reason to believe they are due to noise. An efficient way to do this is to first detect all the local maxima as above, sort them descending by height, then go down the sorted list and keep them if their value in the image has not changed and, if they are kept, set to zero all the pixels in a (2d+1) x (2d+1) neighborhood of them, where d is the min distance between nearby maxima that you are willing to tolerate.

like image 140
Francesco Callari Avatar answered Oct 17 '22 00:10

Francesco Callari


If you have many peaks, it is faster to use scipy.ndimage.center_of_mass. You can replace your code starting with the definition of peak_slices, till the printing of the total time, with the following two lines:

centroids = scipy.ndimage.center_of_mass(Z_thresh, labeled_image,
                                         np.arange(1, number_of_objects + 1))
centroids = [(j, i) for i, j in centroids]

For num_peaks = 20 this runs about 3x slower than your approach, but for num_peaks = 100 it runs about 10x faster. So your best option will depend on your actual data.

like image 41
Jaime Avatar answered Oct 17 '22 01:10

Jaime


An other approach is to avoid all sum(), meshgrid() and stuff. Replace everything with straight linear algebra.

>>> def centroid2(data):
    h,w=data.shape
    x=np.arange(h)
    y=np.arange(w)
    x1=np.ones((1,h))
    y1=np.ones((w,1))
    return ((np.dot(np.dot(x1, data), y))/(np.dot(np.dot(x1, data), y1)),
            (np.dot(np.dot(x, data), y1))/(np.dot(np.dot(x1, data), y1)))
#be careful, it returns two arrays

This can be expended to higher dimension as well. 60% of speedup compares to centroid()

like image 42
CT Zhu Avatar answered Oct 17 '22 00:10

CT Zhu