Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

backward count in a numpy array

Let's suppose I have a series of integer values arranged in a numpy array like this.

nan = np.nan
arr = np.array([3, nan, nan, nan, 5, nan, nan, nan, nan, nan])

nan values should be filled with backward count from the first not null value to zero.

[3, 2, 1, 0, 5, 4, 3, 2, 1, 0]
like image 244
Marco Fumagalli Avatar asked Dec 14 '22 11:12

Marco Fumagalli


2 Answers

IMO, the simplest pandas way of doing this is using groupby and cumcount with ascending=False:

s = pd.Series(np.cumsum(~np.isnan(arr)))
s.groupby(s).cumcount(ascending=False)

0    3
1    2
2    1
3    0
4    5
5    4
6    3
7    2
8    1
9    0
dtype: int64
like image 186
cs95 Avatar answered Jan 03 '23 17:01

cs95


Here's a vectorized one with NumPy -

def backward_count(a):
    m = ~np.isnan(a)
    idx = np.flatnonzero(m)

    p = np.full(len(a), -1, dtype=a.dtype)
    p[idx[0]] = a[idx[0]]+idx[0]

    d = np.diff(idx)
    p[idx[1:]] = np.diff(a[m]) + d - 1
    out = p.cumsum()
    out[:idx[0]] = np.nan
    return out

Sample run with a more generic case -

In [238]: a
Out[238]: array([nan,  3., nan,  5., nan, 10., nan, nan,  4., nan, nan])

In [239]: backward_count(a)
Out[239]: array([nan,  3.,  2.,  5.,  4., 10.,  9.,  8.,  4.,  3.,  2.])

Benchmarking

Setup with scaling up the given sample by 10,000x -

In [240]: arr = np.array([3, nan, nan, nan, 5, nan, nan, nan, nan, nan])

In [241]: arr = np.tile(arr,10000)

# Pandas based one by @cs95
In [243]: %%timeit
     ...: s = pd.Series(np.cumsum(~np.isnan(arr)))
     ...: s.groupby(s).cumcount(ascending=False)
35.9 ms ± 258 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [245]: %timeit backward_count(arr)
3.04 ms ± 4.35 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
like image 21
Divakar Avatar answered Jan 03 '23 15:01

Divakar