finding streaks in pandas dataframe

I have a pandas dataframe as follows:

time    winner  loser   stat
1       A       B       0
2       C       B       0
3       D       B       1
4       E       B       0
5       F       A       0
6       G       A       0
7       H       A       0
8       I       A       1

each row is a match result. the first column is the time of the match, second and third column contain winner/loser and the fourth column is one stat from the match.

I want to detect streaks of zeros for this stat per loser.

The expected result should look like this:

time    winner  loser   stat    streak
1       A       B       0       1
2       C       B       0       2
3       D       B       1       0
4       E       B       0       1
5       F       A       0       1
6       G       A       0       2
7       H       A       0       3
8       I       A       1       0

In pseudocode the algorithm should work like this:

.groupby loser column.
then iterate over each row of each loser group
in each row, look at the stat column: if it contains 0, then increment the streak value from the previous row by 0. if it is not 0, then start a new streak, that is, put 0 into the streak column.

So the .groupby is clear. But then I would need some sort of .apply where I can look at the previous row? this is where I am stuck.

How do you see all the lines in Pandas?

Set value of display. max_rows to None and pass it to set_option and this will display all rows from the data frame.

How do I find rows and columns in Pandas?

To get the number of rows, and columns we can use len(df. axes[]) function in Python.

How do you find the mean of multiple rows in Pandas?

To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.

You can apply custom function f, then cumsum, cumcount and astype:

def f(x):
    x['streak'] = x.groupby( (x['stat'] != 0).cumsum()).cumcount() + 
                  ( (x['stat'] != 0).cumsum() == 0).astype(int) 
    return x

df = df.groupby('loser', sort=False).apply(f)
print df
   time winner loser  stat  streak
0     1      A     B     0       1
1     2      C     B     0       2
2     3      D     B     1       0
3     4      E     B     0       1
4     5      F     A     0       1
5     6      G     A     0       2
6     7      H     A     0       3
7     8      I     A     1       0

For better undestanding:

def f(x):
    x['c'] = (x['stat'] != 0).cumsum()
    x['a'] = (x['c'] == 0).astype(int)
    x['b'] = x.groupby( 'c' ).cumcount()

    x['streak'] = x.groupby( 'c' ).cumcount() + x['a']

    return x
df = df.groupby('loser', sort=False).apply(f)
print df
   time winner loser  stat  c  a  b  streak
0     1      A     B     0  0  1  0       1
1     2      C     B     0  0  1  1       2
2     3      D     B     1  1  0  0       0
3     4      E     B     0  1  0  1       1
4     5      F     A     0  0  1  0       1
5     6      G     A     0  0  1  1       2
6     7      H     A     0  0  1  2       3
7     8      I     A     1  1  0  0       0

Not as elegant as jezrael's answer, but for me easier to understand...

First, define a function that works with a single loser:

def f(df):
    df['streak2'] = (df['stat'] == 0).cumsum()
    df['cumsum'] = np.nan
    df.loc[df['stat'] == 1, 'cumsum'] = df['streak2']
    df['cumsum'] = df['cumsum'].fillna(method='ffill')
    df['cumsum'] = df['cumsum'].fillna(0)
    df['streak'] = df['streak2'] - df['cumsum']
    df.drop(['streak2', 'cumsum'], axis=1, inplace=True)
    return df

The streak is essentially a cumsum, but we need to reset it each time stat is 1. We therefore subtract the value of the cumsum where stat is 1, carried forward until the next 1.

Then groupby and apply by loser:

df.groupby('loser').apply(f)

The result is as expected.

finding streaks in pandas dataframe

Tags:

python

pandas

dataframe

beta

People also ask

2 Answers

jezrael

IanS

Recent Activity

Donate For Us

finding streaks in pandas dataframe

Tags:

python

pandas

dataframe

beta

People also ask

2 Answers

jezrael

IanS

Related questions

Recent Activity

Donate For Us