Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting consecutive 1's in NumPy array

[1, 1, 1, 0, 0, 0, 1, 1, 0, 0]

I have a NumPy array consisting of 0's and 1's like above. How can I add all consecutive 1's like below? Any time I encounter a 0, I reset.

[1, 2, 3, 0, 0, 0, 1, 2, 0, 0]

I can do this using a for loop, but is there a vectorized solution using NumPy?

like image 430
user308827 Avatar asked Feb 09 '17 05:02

user308827


People also ask

What does ones do in NumPy?

Python numpy. ones() function returns a new array of given shape and data type, where the element's value is set to 1. This function is very similar to numpy zeros() function.

Does NumPy have count function?

core. defchararray. count(arr, substring, start=0, end=None) : Counts for the non-overlapping occurrence of sub-string in the specified range.


2 Answers

Here's a vectorized approach -

def island_cumsum_vectorized(a):
    a_ext = np.concatenate(( [0], a, [0] ))
    idx = np.flatnonzero(a_ext[1:] != a_ext[:-1])
    a_ext[1:][idx[1::2]] = idx[::2] - idx[1::2]
    return a_ext.cumsum()[1:-1]

Sample run -

In [91]: a = np.array([1, 1, 1, 0, 0, 0, 1, 1, 0, 0])

In [92]: island_cumsum_vectorized(a)
Out[92]: array([1, 2, 3, 0, 0, 0, 1, 2, 0, 0])

In [93]: a = np.array([0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1])

In [94]: island_cumsum_vectorized(a)
Out[94]: array([0, 1, 2, 3, 4, 0, 0, 0, 1, 2, 0, 0, 1])

Runtime test

For the timings , I would use OP's sample input array and repeat/tile it and hopefully this should be a less opportunistic benchmark -

Small case :

In [16]: a = np.array([1, 1, 1, 0, 0, 0, 1, 1, 0, 0])

In [17]: a = np.tile(a,10)  # Repeat OP's data 10 times

# @Paul Panzer's solution
In [18]: %timeit np.concatenate([np.cumsum(c) if c[0] == 1 else c for c in np.split(a, 1 + np.where(np.diff(a))[0])])
10000 loops, best of 3: 73.4 µs per loop

In [19]: %timeit island_cumsum_vectorized(a)
100000 loops, best of 3: 8.65 µs per loop

Bigger case :

In [20]: a = np.array([1, 1, 1, 0, 0, 0, 1, 1, 0, 0])

In [21]: a = np.tile(a,1000)  # Repeat OP's data 1000 times

# @Paul Panzer's solution
In [22]: %timeit np.concatenate([np.cumsum(c) if c[0] == 1 else c for c in np.split(a, 1 + np.where(np.diff(a))[0])])
100 loops, best of 3: 6.52 ms per loop

In [23]: %timeit island_cumsum_vectorized(a)
10000 loops, best of 3: 49.7 µs per loop

Nah, I want really huge case :

In [24]: a = np.array([1, 1, 1, 0, 0, 0, 1, 1, 0, 0])

In [25]: a = np.tile(a,100000)  # Repeat OP's data 100000 times

# @Paul Panzer's solution
In [26]: %timeit np.concatenate([np.cumsum(c) if c[0] == 1 else c for c in np.split(a, 1 + np.where(np.diff(a))[0])])
1 loops, best of 3: 725 ms per loop

In [27]: %timeit island_cumsum_vectorized(a)
100 loops, best of 3: 7.28 ms per loop
like image 94
Divakar Avatar answered Sep 25 '22 00:09

Divakar


If a list comprehension is acceptable

np.concatenate([np.cumsum(c) if c[0] == 1 else c for c in np.split(a, 1 + np.where(np.diff(a))[0])])
like image 37
Paul Panzer Avatar answered Sep 24 '22 00:09

Paul Panzer