Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a faster-than-linear way to find endpoints of a boolean condition in numpy?

Tags:

numpy

scipy

Does anybody know of a (common case) faster-than-linear way to find the endpoints of a boolean property of an array.

For example numpy.nonzero(a)[0][-1] is the index of the last nonzero element of a (dimension=0), and similarly numpy.nonzero(a)[0][0] is the index of the first nonzero element.

If we know that we only care about the first or last element we can use less memory and have better common-case runtime than running "nonzero" like above. For example if we stick with a linear search we can at least start at the appropriate end (search backwards to find last value matching a condition). Or we might use a binary search (e.g. if middle element matches condition we don't need to check 1st half to find the last element where it's true). This seems common enough that there may be an existing implementation but I haven't found anything like it.

like image 216
Joseph Hastings Avatar asked Mar 05 '12 18:03

Joseph Hastings


1 Answers

You can find the first True element of a boolean array using argmax.

a = np.array([False, False, True, True, True])
first_True = a.argmax()
last_True = len(a) - 1 - a[::-1].argmax()

You can use argmin to find the False values, and this will be faster and take less memory than using nonzero, but this is linear in the length of a. If you want to be faster than linear you need to know that a is "sorted", for a boolean array that means you have a block of False followed by all True. In that case you could use search sorted to find the boundary between the False and the true:

first_True = a.searchsorted(True, 'left')
like image 91
Bi Rico Avatar answered Oct 13 '22 01:10

Bi Rico