I'm trying to get an expanding mean. I can get it to work when I iterate and "group" just by filtering by the specific values, but it takes way too long to do. I feel like this should be an easy application to do with a groupby, but when I do it, it just does the expanding mean to the entire dataset, as opposed to just doing it for each of the groups in grouby.
for a quick example:
I want to take this (in this particular case, grouped by 'player' and 'year'), and get an expanding mean.
player pos year wk pa ra
a qb 2001 1 10 0
a qb 2001 2 5 0
a qb 2001 3 10 0
a qb 2002 1 12 0
a qb 2002 2 13 0
b rb 2001 1 0 20
b rb 2001 2 0 17
b rb 2001 3 0 12
b rb 2002 1 0 14
b rb 2002 2 0 15
to get:
player pos year wk pa ra avg_pa avg_ra
a qb 2001 1 10 0 10 0
a qb 2001 2 5 0 7.5 0
a qb 2001 3 10 0 8.3 0
a qb 2002 1 12 0 12 0
a qb 2002 2 13 0 12.5 0
b rb 2001 1 0 20 0 20
b rb 2001 2 0 17 0 18.5
b rb 2001 3 0 12 0 16.3
b rb 2002 1 0 14 0 14
b rb 2002 2 0 15 0 14.5
Not sure where I'm going wrong:
# Group by player and season - also put weeks in correct ascending order
grouped = calc_averages.groupby(['player','pos','seas']).apply(pd.DataFrame.sort_values, 'wk')
grouped['avg_pa'] = grouped['pa'].expanding().mean()
But this will give an expanding mean for the entire set, not for each player, season.
Try:
df.sort_values('wk').groupby(['player','pos','year'])['pa','ra'].expanding().mean()\
.reset_index()
Output:
player pos year level_3 pa ra
0 a qb 2001 0 10.000000 0.000000
1 a qb 2001 1 7.500000 0.000000
2 a qb 2001 2 8.333333 0.000000
3 a qb 2002 3 12.000000 0.000000
4 a qb 2002 4 12.500000 0.000000
5 b rb 2001 5 0.000000 20.000000
6 b rb 2001 6 0.000000 18.500000
7 b rb 2001 7 0.000000 16.333333
8 b rb 2002 8 0.000000 14.000000
9 b rb 2002 9 0.000000 14.500000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With