I have the following dataframe <pre class="prettyprint"><code> ID ID2 SCORE X Y 0 0 a 10 1 2 1 0 b 20 2 3 2 0 b 20 3 4 3 0 b 30 4 5 4 1 c 5 5 6 5 1 d 6 6 7 </code></pre> What I would like to do, is to groupby <code>ID</code> and <code>ID2</code> and to average the <code>SCORE</code> taking into consideration only UNIQUE scores. Now, if I use the standard <code>df.groupby(['ID', 'ID2'])['SCORE'].mean()</code> I would get 23.33~, where what I am looking for is a score of 25. I know I can filter out <code>X</code> and <code>Y</code>, drop the duplicates and do that, but I want to keep them as they are relevant. How can I achieve that?

If i understand correctly: <pre class="prettyprint"><code>In [41]: df.groupby(['ID', 'ID2'])['SCORE'].agg(lambda x: x.unique().sum()/x.nunique()) Out[41]: ID ID2 0 a 10 b 25 1 c 5 d 6 Name: SCORE, dtype: int64 </code></pre> or bit easier: <pre class="prettyprint"><code>In [43]: df.groupby(['ID', 'ID2'])['SCORE'].agg(lambda x: x.unique().mean()) Out[43]: ID ID2 0 a 10 b 25 1 c 5 d 6 Name: SCORE, dtype: int64 </code></pre>

Pandas groupby and average across unique values

Tags:

python

pandas

dataframe

pandas-groupby

I have the following dataframe

   ID ID2  SCORE  X  Y
0   0   a     10  1  2
1   0   b     20  2  3
2   0   b     20  3  4
3   0   b     30  4  5
4   1   c      5  5  6
5   1   d      6  6  7

What I would like to do, is to groupby ID and ID2 and to average the SCORE taking into consideration only UNIQUE scores.

Now, if I use the standard df.groupby(['ID', 'ID2'])['SCORE'].mean() I would get 23.33~, where what I am looking for is a score of 25.

I know I can filter out X and Y, drop the duplicates and do that, but I want to keep them as they are relevant.

How can I achieve that?

572

asked Oct 08 '17 13:10

bluesummers

Video Answer

1 Answers

If i understand correctly:

In [41]: df.groupby(['ID', 'ID2'])['SCORE'].agg(lambda x: x.unique().sum()/x.nunique())
Out[41]:
ID  ID2
0   a      10
    b      25
1   c       5
    d       6
Name: SCORE, dtype: int64

or bit easier:

In [43]: df.groupby(['ID', 'ID2'])['SCORE'].agg(lambda x: x.unique().mean())
Out[43]:
ID  ID2
0   a      10
    b      25
1   c       5
    d       6
Name: SCORE, dtype: int64

answered Oct 24 '22 08:10

MaxU - stop WAR against UA

Related questions
                            
                                Django – generate a plain text version of an html email
                            
                                SQLAlchemy NOT exists on subselect?
                            
                                Write in Gstreamer pipeline from opencv in python
                            
                                Getting invalid function name warning using Python
                            
                                Filter Pandas DataFrame by comparing columns in a row
                            
                                Remove/hiding username field in django admin edit user form
                            
                                How to determine if a string is escaped unicode
                            
                                Is there a way to skip to a specific line number while reading a file in Python? [duplicate]
                            
                                Transform 2D array to a 3D array with overlapping strides
                            
                                How to know if a GRPC server is available
                            
                                Django ModelForm to update profile picture does not save the photo
                            
                                Python Timer remaining time
                            
                                Install virtualenvwrapper for Python 2.7 and 3.6 simultaneously
                            
                                Pythonic way to determine if json object contains a certain value?
                            
                                Getting timezone from dateutil.parser.parse in Python
                            
                                How to select the first 3 rows of every group in pandas?
                            
                                Delete data frame column if column name ends with some string, Python 3.6 [duplicate]
                            
                                Print all variables and their values [duplicate]
                            
                                Django custom filter error. Returns "invalid filter"
                            
                                Hyperparameter in Voting classifier

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With