I have a dataframe as below,
date hour staff
0 2019-10-01 6 A
1 2019-10-01 6 B
2 2019-10-01 6 C
3 2019-10-02 6 D
4 2019-10-02 6 B
5 2019-10-02 6 A
6 2019-10-03 6 B
7 2019-10-03 6 B
8 2019-10-03 6 B
9 2019-10-01 7 D
10 2019-10-01 7 A
11 2019-10-01 7 B
12 2019-10-01 7 C
13 2019-10-02 7 D
14 2019-10-02 7 C
15 2019-10-02 7 A
16 2019-10-03 7 B
17 2019-10-03 7 B
18 2019-10-03 7 A
I want to compute the average of unique staffs per hour, like below
hour unique_staff
6 2
7 3
Explanation :
At hour 6,
Unique_staff= 2
Oct 1st: 3(A,B,C)+ Oct 2nd: 3(D,B,A)+ Oct 3rd: 1 (B) = 3+3+1=7/3(no. of unique dates) ~2
At hour 7,
Unique_staff= 3
Oct 1st: 4(D,A,B,C)+ Oct 2nd: 3(D,C,A)+ Oct 3rd: 2 (B, A) = 4+3+2=9/3(no. of unique dates) ~3
df.groupby(['hour', 'date'])['staff'].nunique().reset_index()\
.groupby('hour')['staff'].mean().round()
>>> output
6 2.0
7 3.0
EDIT:
anky_91's solution in the comments is much faster and should definitely be used:
df.groupby(['date','hour'])['staff'].nunique().mean(level=1).round()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With