Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group by one column and find the no. of unique values in the other column

I have a dataframe as below,

         date  hour staff
0  2019-10-01     6     A
1  2019-10-01     6     B
2  2019-10-01     6     C
3  2019-10-02     6     D
4  2019-10-02     6     B
5  2019-10-02     6     A
6  2019-10-03     6     B
7  2019-10-03     6     B
8  2019-10-03     6     B
9  2019-10-01     7     D
10 2019-10-01     7     A
11 2019-10-01     7     B
12 2019-10-01     7     C
13 2019-10-02     7     D
14 2019-10-02     7     C
15 2019-10-02     7     A
16 2019-10-03     7     B
17 2019-10-03     7     B
18 2019-10-03     7     A

I want to compute the average of unique staffs per hour, like below

hour     unique_staff
6            2
7            3

Explanation :
At hour 6,
Unique_staff= 2
Oct 1st: 3(A,B,C)+ Oct 2nd: 3(D,B,A)+ Oct 3rd: 1 (B) = 3+3+1=7/3(no. of unique dates) ~2

At hour 7,
Unique_staff= 3
Oct 1st: 4(D,A,B,C)+ Oct 2nd: 3(D,C,A)+ Oct 3rd: 2 (B, A) = 4+3+2=9/3(no. of unique dates) ~3

like image 743
Swasti Avatar asked Nov 25 '25 23:11

Swasti


1 Answers

df.groupby(['hour', 'date'])['staff'].nunique().reset_index()\
  .groupby('hour')['staff'].mean().round()

>>> output

6   2.0
7   3.0

EDIT:

anky_91's solution in the comments is much faster and should definitely be used:

df.groupby(['date','hour'])['staff'].nunique().mean(level=1).round()
like image 184
Brian Avatar answered Nov 27 '25 14:11

Brian



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!