Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy sum running length of non-zero values

Looking for a fast vectorized function that returns the rolling number of consecutive non-zero values. The count should start over at 0 whenever encountering a zero. The result should have the same shape as the input array.

Given an array like this:

x = np.array([2.3, 1.2, 4.1 , 0.0, 0.0, 5.3, 0, 1.2, 3.1])

The function should return this:

array([1, 2, 3, 0, 0, 1, 0, 1, 2])
like image 610
steve Avatar asked Apr 26 '15 02:04

steve


2 Answers

This post lists a vectorized approach which basically consists of two steps:

  1. Initialize a zeros vector of the same size as input vector, x and set ones at places corresponding to non-zeros of x.

  2. Next up, in that vector, we need to put minus of runlengths of each island right after the ending/stop positions for each "island". The intention is to use cumsum again later on, which would result in sequential numbers for the "islands" and zeros elsewhere.

Here's the implementation -

import numpy as np

#Append zeros at the start and end of input array, x
xa = np.hstack([[0],x,[0]])

# Get an array of ones and zeros, with ones for nonzeros of x and zeros elsewhere
xa1 =(xa!=0)+0

# Find consecutive differences on xa1
xadf = np.diff(xa1)

# Find start and stop+1 indices and thus the lengths of "islands" of non-zeros
starts = np.where(xadf==1)[0]
stops_p1 = np.where(xadf==-1)[0]
lens = stops_p1 - starts

# Mark indices where "minus ones" are to be put for applying cumsum
put_m1 = stops_p1[[stops_p1 < x.size]]

# Setup vector with ones for nonzero x's, "minus lens" at stops +1 & zeros elsewhere
vec = xa1[1:-1] # Note: this will change xa1, but it's okay as not needed anymore
vec[put_m1] = -lens[0:put_m1.size]

# Perform cumsum to get the desired output
out = vec.cumsum()

Sample run -

In [116]: x
Out[116]: array([ 0. ,  2.3,  1.2,  4.1,  0. ,  0. ,  5.3,  0. ,  1.2,  3.1,  0. ])

In [117]: out
Out[117]: array([0, 1, 2, 3, 0, 0, 1, 0, 1, 2, 0], dtype=int32)

Runtime tests -

Here's some runtimes tests comparing the proposed approach against the other itertools.groupby based approach -

In [21]: N = 1000000
    ...: x = np.random.rand(1,N)
    ...: x[x>0.5] = 0.0
    ...: x = x.ravel()
    ...: 

In [19]: %timeit sumrunlen_vectorized(x)
10 loops, best of 3: 19.9 ms per loop

In [20]: %timeit sumrunlen_loopy(x)
1 loops, best of 3: 2.86 s per loop
like image 103
Divakar Avatar answered Nov 04 '22 06:11

Divakar


You can use itertools.groupby and np.hstack :

>>> import numpy as np
>>> x = np.array([2.3, 1.2, 4.1 , 0.0, 0.0, 5.3, 0, 1.2, 3.1])
>>> from itertools import groupby

>>> np.hstack([[i if j!=0 else j for i,j in enumerate(g,1)] for _,g in groupby(x,key=lambda x: x!=0)])
array([ 1.,  2.,  3.,  0.,  0.,  1.,  0.,  1.,  2.])

We can group the array elements based on non-zero elements then use a list comprehension and enumerate to replace the non-zero sub-arrays with those index then flatten the list with np.hstack.

like image 21
Mazdak Avatar answered Nov 04 '22 06:11

Mazdak