Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get cumulative mean among groups in Python

I am trying to get a cumulative mean in python among different groups. I have data as follows:

id  date        value
1   2019-01-01  2
1   2019-01-02  8
1   2019-01-04  3
1   2019-01-08  4
1   2019-01-10  12
1   2019-01-13  6
2   2019-01-01  4
2   2019-01-03  2
2   2019-01-04  3
2   2019-01-06  6
2   2019-01-11  1

The output I'm trying to get something like this:

id  date        value   cumulative_avg
1   2019-01-01  2   NaN
1   2019-01-02  8   2
1   2019-01-04  3   5
1   2019-01-08  4   4.33
1   2019-01-10  12  4.25
1   2019-01-13  6   5.8
2   2019-01-01  4   NaN
2   2019-01-03  2   4
2   2019-01-04  3   3
2   2019-01-06  6   3
2   2019-01-11  1   3.75

I need the cumulative average to restart with each new id. I can get a variation of what I'm looking for with a single, for example if the data set only had the data where id = 1 then I could use:

df['cumulative_avg'] = df['value'].expanding.mean().shift(1)

I try to add a group by into it but I get an error:

df['cumulative_avg'] = df.groupby('id')['value'].expanding().mean().shift(1)

TypeError: incompatible index of inserted column with frame index

Also tried:

df.set_index(['account']
ValueError: cannot handle a non-unique multi-index!

The actual data I have has millions of rows, and thousands of unique ids'. Any help with a speedy/efficient way to do this would be appreciated.

like image 811
Steveiepete Avatar asked Dec 17 '22 14:12

Steveiepete


1 Answers

For many groups this will perform better because it ditches the apply. Take the cumsum divided by the cumcount, subtracting off the value to get the analog of expanding. Fortunately pandas interprets 0/0 as NaN.

gp = df.groupby('id')['value']
df['cum_avg'] = (gp.cumsum() - df['value'])/gp.cumcount()

    id        date  value   cum_avg
0    1  2019-01-01      2       NaN
1    1  2019-01-02      8  2.000000
2    1  2019-01-04      3  5.000000
3    1  2019-01-08      4  4.333333
4    1  2019-01-10     12  4.250000
5    1  2019-01-13      6  5.800000
6    2  2019-01-01      4       NaN
7    2  2019-01-03      2  4.000000
8    2  2019-01-04      3  3.000000
9    2  2019-01-06      6  3.000000
10   2  2019-01-11      1  3.750000
like image 163
ALollz Avatar answered Dec 22 '22 00:12

ALollz