Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy first occurrence of value greater than existing value

Tags:

python

numpy

This is a little faster (and looks nicer)

np.argmax(aa>5)

Since argmax will stop at the first True ("In case of multiple occurrences of the maximum values, the indices corresponding to the first occurrence are returned.") and doesn't save another list.

In [2]: N = 10000

In [3]: aa = np.arange(-N,N)

In [4]: timeit np.argmax(aa>N/2)
100000 loops, best of 3: 52.3 us per loop

In [5]: timeit np.where(aa>N/2)[0][0]
10000 loops, best of 3: 141 us per loop

In [6]: timeit np.nonzero(aa>N/2)[0][0]
10000 loops, best of 3: 142 us per loop

given the sorted content of your array, there is an even faster method: searchsorted.

import time
N = 10000
aa = np.arange(-N,N)
%timeit np.searchsorted(aa, N/2)+1
%timeit np.argmax(aa>N/2)
%timeit np.where(aa>N/2)[0][0]
%timeit np.nonzero(aa>N/2)[0][0]

# Output
100000 loops, best of 3: 5.97 µs per loop
10000 loops, best of 3: 46.3 µs per loop
10000 loops, best of 3: 154 µs per loop
10000 loops, best of 3: 154 µs per loop

I was also interested in this and I've compared all the suggested answers with perfplot. (Disclaimer: I'm the author of perfplot.)

If you know that the array you're looking through is already sorted, then

numpy.searchsorted(a, alpha)

is for you. It's O(log(n)) operation, i.e., the speed hardly depends on the size of the array. You can't get faster than that.

If you don't know anything about your array, you're not going wrong with

numpy.argmax(a > alpha)

Already sorted:

enter image description here

Unsorted:

enter image description here

Code to reproduce the plot:

import numpy
import perfplot


alpha = 0.5
numpy.random.seed(0)


def argmax(data):
    return numpy.argmax(data > alpha)


def where(data):
    return numpy.where(data > alpha)[0][0]


def nonzero(data):
    return numpy.nonzero(data > alpha)[0][0]


def searchsorted(data):
    return numpy.searchsorted(data, alpha)


perfplot.save(
    "out.png",
    # setup=numpy.random.rand,
    setup=lambda n: numpy.sort(numpy.random.rand(n)),
    kernels=[argmax, where, nonzero, searchsorted],
    n_range=[2 ** k for k in range(2, 23)],
    xlabel="len(array)",
)

In [34]: a=np.arange(-10,10)

In [35]: a
Out[35]:
array([-10,  -9,  -8,  -7,  -6,  -5,  -4,  -3,  -2,  -1,   0,   1,   2,
         3,   4,   5,   6,   7,   8,   9])

In [36]: np.where(a>5)
Out[36]: (array([16, 17, 18, 19]),)

In [37]: np.where(a>5)[0][0]
Out[37]: 16