Removing completely isolated cells from Python array?

Tags:

I'm trying to reduce noise in a binary python array by removing all completely isolated single cells, i.e. setting "1" value cells to 0 if they are completely surrounded by other "0"s. I have been able to get a working solution by removing blobs with sizes equal to 1 using a loop, but this seems like a very inefficient solution for large arrays:

import numpy as np
import scipy.ndimage as ndimage
import matplotlib.pyplot as plt    

# Generate sample data
square = np.zeros((32, 32))
square[10:-10, 10:-10] = 1
np.random.seed(12)
x, y = (32*np.random.random((2, 20))).astype(np.int)
square[x, y] = 1

# Plot original data with many isolated single cells
plt.imshow(square, cmap=plt.cm.gray, interpolation='nearest')

# Assign unique labels
id_regions, number_of_ids = ndimage.label(square, structure=np.ones((3,3)))

# Set blobs of size 1 to 0
for i in xrange(number_of_ids + 1):
    if id_regions[id_regions==i].size == 1:
        square[id_regions==i] = 0

# Plot desired output, with all isolated single cells removed
plt.imshow(square, cmap=plt.cm.gray, interpolation='nearest')

In this case, eroding and dilating my array won't work as it will also remove features with a width of 1. I feel the solution lies somewhere within the scipy.ndimage package, but so far I haven't been able to crack it. Any help would be greatly appreciated!

899

asked Feb 02 '15 09:02

Robbi Bishop-Taylor

2 Answers

A belated thanks to both Jaime and Kazemakase for their replies. The manual neighbour-checking method did remove all isolated patches, but also removed patches attached to others by one corner (i.e. to the upper-right of the square in the sample array). The summed area table works perfectly and is very fast on the small sample array, but slows down on larger arrays.

I ended up following a approach using ndimage which seems to work efficiently for very large and sparse arrays (0.91 sec for 5000 x 5000 array vs 1.17 sec for summed area table approach). I first generated a labelled array of unique IDs for each discrete region, calculated sizes for each ID, masked the size array to focus only on size == 1 blobs, then index the original array and set IDs with a size == 1 to 0:

def filter_isolated_cells(array, struct):
    """ Return array with completely isolated single cells removed
    :param array: Array with completely isolated single cells
    :param struct: Structure array for generating unique regions
    :return: Array with minimum region size > 1
    """

    filtered_array = np.copy(array)
    id_regions, num_ids = ndimage.label(filtered_array, structure=struct)
    id_sizes = np.array(ndimage.sum(array, id_regions, range(num_ids + 1)))
    area_mask = (id_sizes == 1)
    filtered_array[area_mask[id_regions]] = 0
    return filtered_array

# Run function on sample array
filtered_array = filter_isolated_cells(square, struct=np.ones((3,3)))

# Plot output, with all isolated single cells removed
plt.imshow(filtered_array, cmap=plt.cm.gray, interpolation='nearest')

Result: Resulting array

125

answered Sep 28 '22 05:09

Robbi Bishop-Taylor

You can manually check the neighbors and avoid the loop using vectorization.

has_neighbor = np.zeros(square.shape, bool)
has_neighbor[:, 1:] = np.logical_or(has_neighbor[:, 1:], square[:, :-1] > 0)  # left
has_neighbor[:, :-1] = np.logical_or(has_neighbor[:, :-1], square[:, 1:] > 0)  # right
has_neighbor[1:, :] = np.logical_or(has_neighbor[1:, :], square[:-1, :] > 0)  # above
has_neighbor[:-1, :] = np.logical_or(has_neighbor[:-1, :], square[1:, :] > 0)  # below

square[np.logical_not(has_neighbor)] = 0

That way looping over the square is performed internally by numpy, which is rather more efficient than looping in python. There are two drawbacks of this solution:

If your array is very sparse there may be more efficient ways to check the neighborhood of non-zero points.
If your array is very large the has_neighbor array might consume too much memory. In this case you could loop over sub-arrays of smaller size (trade-off between python loops and vectorization).

I have no experience with ndimage, so there may be a better solution built in somewhere.

answered Sep 28 '22 07:09

MB-F

Related questions
                            
                                Access to variables from outside function
                            
                                Finding k-mers in a sliding window
                            
                                Reassign a function attribute makes it 'unreachable'
                            
                                Does python logging replace print?
                            
                                Sorl-thumbnail generates black square instead of image
                            
                                python os module does not recognize ~ as shortcut for the user home directory
                            
                                No module named thrift in Python script
                            
                                Plot smooth curves of Pandas Series data
                            
                                Can I pass self as the first argument for class methods in python
                            
                                Jinja2 dictonary lookup using a variable key
                            
                                RuntimeError: working outside of request context
                            
                                Using counts and tfidf as features with scikit learn
                            
                                Build a Pandas pd.tseries.offsets from timedelta
                            
                                Django Rest API urlsplit error
                            
                                PyCharm and debugging private attributes
                            
                                Determining a homogeneous affine transformation matrix from six points in 3D using Python
                            
                                Python Tornado render static directory
                            
                                Django nested Transaction.atomic
                            
                                PySpark distinct().count() on a csv file
                            
                                numpy einsum to get axes permutation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Removing completely isolated cells from Python array?

Tags:

python

numpy

python-2.6

scipy

ndimage

Robbi Bishop-Taylor

People also ask

2 Answers

Robbi Bishop-Taylor

MB-F

Recent Activity

Donate For Us