Let's suppose I have a series of integer values arranged in a numpy array like this.
nan = np.nan
arr = np.array([3, nan, nan, nan, 5, nan, nan, nan, nan, nan])
nan
values should be filled with backward count from the first not null value to zero.
[3, 2, 1, 0, 5, 4, 3, 2, 1, 0]
IMO, the simplest pandas way of doing this is using groupby
and cumcount
with ascending=False
:
s = pd.Series(np.cumsum(~np.isnan(arr)))
s.groupby(s).cumcount(ascending=False)
0 3
1 2
2 1
3 0
4 5
5 4
6 3
7 2
8 1
9 0
dtype: int64
Here's a vectorized one with NumPy -
def backward_count(a):
m = ~np.isnan(a)
idx = np.flatnonzero(m)
p = np.full(len(a), -1, dtype=a.dtype)
p[idx[0]] = a[idx[0]]+idx[0]
d = np.diff(idx)
p[idx[1:]] = np.diff(a[m]) + d - 1
out = p.cumsum()
out[:idx[0]] = np.nan
return out
Sample run with a more generic case -
In [238]: a
Out[238]: array([nan, 3., nan, 5., nan, 10., nan, nan, 4., nan, nan])
In [239]: backward_count(a)
Out[239]: array([nan, 3., 2., 5., 4., 10., 9., 8., 4., 3., 2.])
Setup with scaling up the given sample by 10,000x
-
In [240]: arr = np.array([3, nan, nan, nan, 5, nan, nan, nan, nan, nan])
In [241]: arr = np.tile(arr,10000)
# Pandas based one by @cs95
In [243]: %%timeit
...: s = pd.Series(np.cumsum(~np.isnan(arr)))
...: s.groupby(s).cumcount(ascending=False)
35.9 ms ± 258 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [245]: %timeit backward_count(arr)
3.04 ms ± 4.35 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With