I have the following dataframe
ID ID2 SCORE X Y
0 0 a 10 1 2
1 0 b 20 2 3
2 0 b 20 3 4
3 0 b 30 4 5
4 1 c 5 5 6
5 1 d 6 6 7
What I would like to do, is to groupby ID
and ID2
and to average the SCORE
taking into consideration only UNIQUE scores.
Now, if I use the standard df.groupby(['ID', 'ID2'])['SCORE'].mean()
I would get 23.33~, where what I am looking for is a score of 25.
I know I can filter out X
and Y
, drop the duplicates and do that, but I want to keep them as they are relevant.
How can I achieve that?
Pandas Groupby Mean To get the average (or mean) value of in each group, you can directly apply the pandas mean() function to the selected columns from the result of pandas groupby.
You can use the nunique() function to count the number of unique values in a pandas DataFrame.
The output of number of unique values is returned. In this example, length of array returned by unique() method is compared to integer returned by nunique() method. Output: The output is not same in both of the cases as dropna parameter is set to True and hence NULL values were excluded while counting unique values.
If i understand correctly:
In [41]: df.groupby(['ID', 'ID2'])['SCORE'].agg(lambda x: x.unique().sum()/x.nunique())
Out[41]:
ID ID2
0 a 10
b 25
1 c 5
d 6
Name: SCORE, dtype: int64
or bit easier:
In [43]: df.groupby(['ID', 'ID2'])['SCORE'].agg(lambda x: x.unique().mean())
Out[43]:
ID ID2
0 a 10
b 25
1 c 5
d 6
Name: SCORE, dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With