I have a data set with three colums: rating , breed, and dog. <pre class="prettyprint"><code>import pandas as pd dogs = {'breed': ['Chihuahua', 'Chihuahua', 'Dalmatian', 'Sphynx'], 'dog': [True, True, True, False], 'rating': [8.0, 9.0, 10.0, 7.0]} df = pd.DataFrame(data=dogs) </code></pre> I would like to calculate the mean rating per breed where dog is True. This would be the expected: <pre class="prettyprint"><code> breed rating 0 Chihuahua 8.5 1 Dalmatian 10.0 </code></pre> This has been my attempt: <pre class="prettyprint"><code>df.groupby('breed')['rating'].mean().where(dog == True) </code></pre> And this is the error that I get: <pre class="prettyprint"><code>NameError: name 'dog' is not defined </code></pre> But when I try add the <code>where</code> condition I only get errors. Can anyone advise a solution? TIA

Once you groupby and select a column, your <code>dog</code> column doesn't exist anymore in the context you have selected (and even if it did you are not accessing it correctly). Filter your dataframe first, then use <code>groupby</code> with <code>mean</code> <pre class="prettyprint"><code>df[df.dog].groupby('breed')['rating'].mean().reset_index() breed rating 0 Chihuahua 8.5 1 Dalmatian 10.0 </code></pre>

An alternative solution is to make <code>dog</code> one of your grouper keys. Then filter by <code>dog</code> in a separate step. This is more efficient if you do not want to lose aggregated data for non-dogs. <pre class="prettyprint"><code>res = df.groupby(['dog', 'breed'])['rating'].mean().reset_index() print(res) dog breed rating 0 False Sphynx 7.0 1 True Chihuahua 8.5 2 True Dalmatian 10.0 print(res[res['dog']]) dog breed rating 1 True Chihuahua 8.5 2 True Dalmatian 10.0 </code></pre>

Pandas: Group by a column that meets a condition

Tags:

python

pandas

dataframe

group-by

pandas-groupby

I have a data set with three colums: rating , breed, and dog.

import pandas as pd
dogs = {'breed': ['Chihuahua', 'Chihuahua', 'Dalmatian', 'Sphynx'],
        'dog': [True, True, True, False],
        'rating': [8.0, 9.0, 10.0, 7.0]}

df = pd.DataFrame(data=dogs)

I would like to calculate the mean rating per breed where dog is True. This would be the expected:

  breed     rating
0 Chihuahua 8.5   
1 Dalmatian 10.0

This has been my attempt:

df.groupby('breed')['rating'].mean().where(dog == True)

And this is the error that I get:

NameError: name 'dog' is not defined

But when I try add the where condition I only get errors. Can anyone advise a solution? TIA

388

asked Jun 03 '18 01:06

seisgradox

Video Answer

2 Answers

Once you groupby and select a column, your dog column doesn't exist anymore in the context you have selected (and even if it did you are not accessing it correctly).

Filter your dataframe first, then use groupby with mean

df[df.dog].groupby('breed')['rating'].mean().reset_index()

       breed  rating
0  Chihuahua     8.5
1  Dalmatian    10.0

174

answered Oct 16 '22 23:10

user3483203

An alternative solution is to make dog one of your grouper keys. Then filter by dog in a separate step. This is more efficient if you do not want to lose aggregated data for non-dogs.

res = df.groupby(['dog', 'breed'])['rating'].mean().reset_index()

print(res)

     dog      breed  rating
0  False     Sphynx     7.0
1   True  Chihuahua     8.5
2   True  Dalmatian    10.0

print(res[res['dog']])

    dog      breed  rating
1  True  Chihuahua     8.5
2  True  Dalmatian    10.0

answered Oct 16 '22 22:10

jpp

Related questions
                            
                                ImportError: No module named 'gdbm' occuring while using source ~/.bashrc
                            
                                Changing value in data frame column in a loop python
                            
                                python3 replacing double backslash with single backslash [duplicate]
                            
                                How to Run Anaconda pompt in Ubuntu
                            
                                dictionary to multi-index pandas dataframe
                            
                                Explode column of lists into multiple columns
                            
                                Error: TensorFlow: tf.enable_eager_execution must be called at program startup
                            
                                Running periodic task at time stored in database
                            
                                Python storing Japanese word into JSON file
                            
                                Pandas rounding decimals not working
                            
                                Django error message: ["'on' value must be either True or False."]
                            
                                Yield from Async Generator in Python AsyncIO
                            
                                How to convert a pyw file to exe?
                            
                                Pandas read excel sheet with multiple header when first column is empty
                            
                                split string and make key value pair
                            
                                The pythonic way to access a class attribute within the class
                            
                                Where to put the doc string for a decorator
                            
                                Interpolate time series, select y value from x
                            
                                Why shouldn't one dynamically generate variable names in python?
                            
                                How to unpickle a file that has been hosted in a web URL in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With