Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing completely isolated cells from Python array?

I'm trying to reduce noise in a binary python array by removing all completely isolated single cells, i.e. setting "1" value cells to 0 if they are completely surrounded by other "0"s. I have been able to get a working solution by removing blobs with sizes equal to 1 using a loop, but this seems like a very inefficient solution for large arrays:

import numpy as np
import scipy.ndimage as ndimage
import matplotlib.pyplot as plt    

# Generate sample data
square = np.zeros((32, 32))
square[10:-10, 10:-10] = 1
np.random.seed(12)
x, y = (32*np.random.random((2, 20))).astype(np.int)
square[x, y] = 1

# Plot original data with many isolated single cells
plt.imshow(square, cmap=plt.cm.gray, interpolation='nearest')

# Assign unique labels
id_regions, number_of_ids = ndimage.label(square, structure=np.ones((3,3)))

# Set blobs of size 1 to 0
for i in xrange(number_of_ids + 1):
    if id_regions[id_regions==i].size == 1:
        square[id_regions==i] = 0

# Plot desired output, with all isolated single cells removed
plt.imshow(square, cmap=plt.cm.gray, interpolation='nearest')

In this case, eroding and dilating my array won't work as it will also remove features with a width of 1. I feel the solution lies somewhere within the scipy.ndimage package, but so far I haven't been able to crack it. Any help would be greatly appreciated!

like image 899
Robbi Bishop-Taylor Avatar asked Feb 02 '15 09:02

Robbi Bishop-Taylor


People also ask

How do I remove something from an array in Python?

You can use the pop() method to remove an element from the array.

How do I delete rows in NumPy array based on condition?

np. delete(ndarray, index, axis): Delete items of rows or columns from the NumPy array based on given index conditions and axis specified, the parameter ndarray is the array on which the manipulation will happen, the index is the particular rows based on conditions to be deleted, axis=0 for removing rows in our case.

How do I remove multiple columns from an array in Python?

Using the NumPy function np. delete() , you can delete any row and column from the NumPy array ndarray . Specify the axis (dimension) and position (row number, column number, etc.). It is also possible to select multiple rows and columns using a slice or a list.

What does NumPy squeeze do?

NumPy: squeeze() function The squeeze() function is used to remove single-dimensional entries from the shape of an array. Input data. Selects a subset of the single-dimensional entries in the shape. If an axis is selected with shape entry greater than one, an error is raised.


2 Answers

A belated thanks to both Jaime and Kazemakase for their replies. The manual neighbour-checking method did remove all isolated patches, but also removed patches attached to others by one corner (i.e. to the upper-right of the square in the sample array). The summed area table works perfectly and is very fast on the small sample array, but slows down on larger arrays.

I ended up following a approach using ndimage which seems to work efficiently for very large and sparse arrays (0.91 sec for 5000 x 5000 array vs 1.17 sec for summed area table approach). I first generated a labelled array of unique IDs for each discrete region, calculated sizes for each ID, masked the size array to focus only on size == 1 blobs, then index the original array and set IDs with a size == 1 to 0:

def filter_isolated_cells(array, struct):
    """ Return array with completely isolated single cells removed
    :param array: Array with completely isolated single cells
    :param struct: Structure array for generating unique regions
    :return: Array with minimum region size > 1
    """

    filtered_array = np.copy(array)
    id_regions, num_ids = ndimage.label(filtered_array, structure=struct)
    id_sizes = np.array(ndimage.sum(array, id_regions, range(num_ids + 1)))
    area_mask = (id_sizes == 1)
    filtered_array[area_mask[id_regions]] = 0
    return filtered_array

# Run function on sample array
filtered_array = filter_isolated_cells(square, struct=np.ones((3,3)))

# Plot output, with all isolated single cells removed
plt.imshow(filtered_array, cmap=plt.cm.gray, interpolation='nearest')

Result: Resulting array

like image 125
Robbi Bishop-Taylor Avatar answered Sep 28 '22 05:09

Robbi Bishop-Taylor


You can manually check the neighbors and avoid the loop using vectorization.

has_neighbor = np.zeros(square.shape, bool)
has_neighbor[:, 1:] = np.logical_or(has_neighbor[:, 1:], square[:, :-1] > 0)  # left
has_neighbor[:, :-1] = np.logical_or(has_neighbor[:, :-1], square[:, 1:] > 0)  # right
has_neighbor[1:, :] = np.logical_or(has_neighbor[1:, :], square[:-1, :] > 0)  # above
has_neighbor[:-1, :] = np.logical_or(has_neighbor[:-1, :], square[1:, :] > 0)  # below

square[np.logical_not(has_neighbor)] = 0

That way looping over the square is performed internally by numpy, which is rather more efficient than looping in python. There are two drawbacks of this solution:

  1. If your array is very sparse there may be more efficient ways to check the neighborhood of non-zero points.
  2. If your array is very large the has_neighbor array might consume too much memory. In this case you could loop over sub-arrays of smaller size (trade-off between python loops and vectorization).

I have no experience with ndimage, so there may be a better solution built in somewhere.

like image 44
MB-F Avatar answered Sep 28 '22 07:09

MB-F