Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - Row number since last greater than 0 value

Let's say I have a Pandas series like so:

import pandas as pd

pd.Series([1, 0, 0, 1, 0, 0, 0], name='series')

How would I add a column with a row count since the last >0 number, like so:

pd.DataFrame({
    'series': [1, 0, 0, 1, 0, 0, 0],
    'row_num': [0, 1, 2, 0, 1, 2, 3]
})
like image 566
Chris C Avatar asked Jul 08 '19 21:07

Chris C


2 Answers

Try this:

s.groupby(s.cumsum()).cumcount()

Output:

0    0
1    1
2    2
3    0
4    1
5    2
6    3
dtype: int64
like image 92
Scott Boston Avatar answered Nov 11 '22 02:11

Scott Boston


Numpy

  • Find the places where the series/array is greater than 0
  • Calculate the differences from one place to the next
  • Subtract those values from a sequence

i = np.flatnonzero(s)
n = len(s)
delta = np.diff(np.append(i, n))
r = np.arange(n)
r - r[i].repeat(delta)

array([0, 1, 2, 0, 1, 2, 3])
like image 21
piRSquared Avatar answered Nov 11 '22 02:11

piRSquared