I have a DataFrame df
where each record represents a soccer game. Teams will appear more than once. I need to compute some sort of a rolling mean for each team scores(well, not exactly the rolling mean to the letter).
date home away score_h score_a
166 2013-09-01 Fulham Chelsea 0 0
167 2013-09-03 Arsenal Everton 0 2
164 2013-09-05 Arsenal Swansea 5 1
165 2013-09-06 Fulham Norwich 0 1
163 2013-09-18 Arsenal Swansea 0 0
What I need to calculate, is the mean score for each team (home and away).
For brevity, let's just do the home column:
grouped = df.groupby('home')
grouped = grouped.sort_index(by='date') # rows inside groups must be in asc order
This results in:
date home away score_h score_a
home
Arsenal 167 2013-09-03 Arsenal Everton 0 2
164 2013-09-05 Arsenal Swansea 5 1
163 2013-09-18 Arsenal Swansea 0 0
Fulham 166 2013-09-01 Fulham Chelsea 0 0
165 2013-09-06 Fulham Norwich 0 1
Question starts here
Now, I need to compute "rolling mean" for teams. Let's do it by hand for the group named Arsenal
. At the end of this we should wind up with 2 extra columns, let's call them: rmean_h
and rmean_a
. First record in the group (167
) has scores of 0
and 2
. The rmean
of these is simply 0
and 2
respectively. For second record in the group (164
), the rmeans will be (0+5)/2 = 2.5
and (2+1) / 2 = 1.5
, and for the third record, (0+5+0)/3 = 1.66
and (2+1+0)/3 = 1
.
Our DataFrame should now looks like this:
date home away score_h score_a rmean_h rmean_a
home
Arsenal 167 2013-09-03 Arsenal Everton 0 2 0 2
164 2013-09-05 Arsenal Swansea 5 1 2.5 1.5
163 2013-09-18 Arsenal Swansea 0 0 1.66 1
Fulham 166 2013-09-01 Fulham Chelsea 0 0
165 2013-09-06 Fulham Norwich 0 1
I want to carry out these calculations for my data, do you have any suggestions please?
You can apply an expanding_mean
(see docs) to each group:
grouped = df.sort(columns='date').groupby('home')
grouped['score_h'].apply(pd.expanding_mean)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With