Imagine a table like this:
name | value
-----|------
Jack | 0
Jack | 1
Jack | 0.5
Jack | 1
Jill | 0
Jill | 2
For every name, I'd like to have the cumulative average, like this:
name | value | cumAverage
-----|-------|-----------
Jack | 0 | 0
Jack | 1 | 0.5
Jack | 0.5 | 0.5
Jack | 1 | 0.625
Jill | 0 | 0
Jill | 2 | 1
So whenever a new name appears, the cumulative average should "restart". The name column is sorted, so whenever a new name appears the current cumulative average is finished.
You need expanding().mean() with groupby:
df.groupby('name')['value'].expanding().mean().reset_index(0)
For Unsorted df the below will work:
df.groupby('name')['value'].expanding().mean().reset_index(0).sort_index()
name value
0 Jack 0.000
1 Jack 0.500
2 Jack 0.500
3 Jack 0.625
4 Jill 0.000
5 Jill 1.000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With