Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find number of zeros before non-zero in a numpy array

I have a numpy array A. I would like to return the number of zeros before a non-zero in A in an efficient way as it is in a loop.

If A = np.array([0,1,2]) then np.nonzero(A)[0][0] returns 1. However if A = np.array([0,0,0]) this doesn't work (I would like the answer 3 in this case). And also if A is very big and the first non-zero is near the beginning this seems inefficient.

like image 394
graffe Avatar asked Apr 23 '14 15:04

graffe


2 Answers

Here's an iterative Cython version, which may be your best bet if this is a serious bottleneck

# saved as file count_leading_zeros.pyx
import numpy as np
cimport numpy as np
cimport cython

DTYPE = np.int
ctypedef np.int_t DTYPE_t

@cython.boundscheck(False)
def count_leading_zeros(np.ndarray[DTYPE_t, ndim=1] a):
    cdef int elements = a.size
    cdef int i = 0
    cdef int count = 0
    while i < elements:
        if a[i] == 0:
            count += 1
        else:
            return count
        i += 1
    return count

This is similar to @mtrw's answer but with indexing at native speeds. My Cython is a bit sketchy so there may be further improvements to be made.

A quick test of an extremely favourable case with IPython with a few different methods

In [1]: import numpy as np

In [2]: import pyximport; pyximport.install()
Out[2]: (None, <pyximport.pyximport.PyxImporter at 0x53e9250>)

In [3]: import count_leading_zeros

In [4]: %paste
def count_leading_zeros_python(x):
    ctr = 0
    for k in x:
        if k == 0:
            ctr += 1
        else:
            return ctr
    return ctr
## -- End pasted text --
In [5]: a = np.zeros((10000000,), dtype=np.int)

In [6]: a[5] = 1

In [7]: 

In [7]: %timeit np.min(np.nonzero(np.hstack((a, 1))))
10 loops, best of 3: 91.1 ms per loop

In [8]: 

In [8]: %timeit np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0  else np.shape(a)[0]
10 loops, best of 3: 107 ms per loop

In [9]: 

In [9]: %timeit count_leading_zeros_python(a)
100000 loops, best of 3: 3.87 µs per loop

In [10]: 

In [10]: %timeit count_leading_zeros.count_leading_zeros(a)
1000000 loops, best of 3: 489 ns per loop

However I'd only use something like this if I had evidence (with a profiler) that this was a bottleneck. Many things may seem inefficient but are never worth your time to fix.

like image 200
YXD Avatar answered Oct 13 '22 19:10

YXD


By adding a nonzero number at the end of the array, you can still use np.nonzero to get your desired outcome.

A = np.array([0,1,2])
B = np.array([0,0,0])

np.min(np.nonzero(np.hstack((A, 1))))   # --> 1
np.min(np.nonzero(np.hstack((B, 1))))   # --> 3
like image 44
tanemaki Avatar answered Oct 13 '22 20:10

tanemaki