Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find span where condition is True using NumPy

Imagine I have a numpy array and I need to find the spans/ranges where that condition is True. For example, I have the following array in which I'm trying to find spans where items are greater than 1:

[0, 0, 0, 2, 2, 0, 2, 2, 2, 0]

I would need to find indices (start, stop):

(3, 5) 
(6, 9)

The fastest thing I've been able to implement is making a boolean array of:

truth = data > threshold

and then looping through the array using numpy.argmin and numpy.argmax to find start and end positions.

    pos = 0
    truth = container[RATIO,:] > threshold

    while pos < len(truth):
        start = numpy.argmax(truth[pos:]) + pos + offset
        end = numpy.argmin(truth[start:]) + start  + offset
        if not truth[start]:#nothing more
            break
        if start == end:#goes to the end
            end = len(truth)
        pos = end

But this has been too slow for the billions of positions in my arrays and the fact that the spans I'm finding are usually just a few positions in a row. Does anyone know a faster way to find these spans?

like image 942
ACEnglish Avatar asked Mar 24 '23 13:03

ACEnglish


2 Answers

How's one way. First take the boolean array you have:

In [11]: a
Out[11]: array([0, 0, 0, 2, 2, 0, 2, 2, 2, 0])

In [12]: a1 = a > 1

Shift it one to the left (to get the next state at each index) using roll:

In [13]: a1_rshifted = np.roll(a1, 1)

In [14]: starts = a1 & ~a1_rshifted  # it's True but the previous isn't

In [15]: ends = ~a1 & a1_rshifted

Where this is non-zero is the start of each True batch (or, respectively, end batch):

In [16]: np.nonzero(starts)[0], np.nonzero(ends)[0]
Out[16]: (array([3, 6]), array([5, 9]))

And zipping these together:

In [17]: zip(np.nonzero(starts)[0], np.nonzero(ends)[0])
Out[17]: [(3, 5), (6, 9)]
like image 85
Andy Hayden Avatar answered Mar 29 '23 00:03

Andy Hayden


If you have access to the scipy library:

You can use scipy.ndimage.measurements.label to identify any regions of non zero value. it returns an array where the value of each element is the id of a span or range in the original array.

You can then use scipy.ndimage.measurements.find_objects to return the slices you would need to extract those ranges. You can access the start / end values directly from those slices.

In your example:

import numpy
from scipy.ndimage.measurements import label, find_objects

data = numpy.array([0, 0, 0, 2, 2, 0, 2, 2, 2, 0])

labels, number_of_regions = label(data)
ranges = find_objects(labels)

for identified_range in ranges:
    print(identified_range[0].start, identified_range[0].stop)

You should see:

3 5
6 9

Hope this helps!

like image 35
djspoulter Avatar answered Mar 29 '23 00:03

djspoulter