Panda dataframe conditional .mean() depending on values in certain column

Tags:

I'm trying to create a new column which returns the mean of values from an existing column in the same df. However the mean should be computed based on a grouping in three other columns.

Out[184]: 
   YEAR daytype hourtype  scenario  option_value    
0  2015     SAT     of_h         0      0.134499       
1  2015     SUN     of_h         1     63.019250      
2  2015     WD      of_h         2     52.113516       
3  2015     WD      pk_h         3     43.126513       
4  2015     SAT     of_h         4     56.431392

I basically would like to have a new column 'mean' which compute the mean of "option value", when "YEAR", "daytype", and "hourtype" are similar.

I tried the following approach but without success ...

In [185]: o2['premium']=o2.groupby(['YEAR', 'daytype', 'hourtype'])['option_cf'].mean()

TypeError: incompatible index of inserted column with frame index

987

asked Apr 21 '15 16:04

tpapz

2 Answers

Here's one way to do it

In [19]: def cust_mean(grp):
   ....:     grp['mean'] = grp['option_value'].mean()
   ....:     return grp
   ....:

In [20]: o2.groupby(['YEAR', 'daytype', 'hourtype']).apply(cust_mean)
Out[20]:
   YEAR daytype hourtype  scenario  option_value       mean
0  2015     SAT     of_h         0      0.134499  28.282946
1  2015     SUN     of_h         1     63.019250  63.019250
2  2015      WD     of_h         2     52.113516  52.113516
3  2015      WD     pk_h         3     43.126513  43.126513
4  2015     SAT     of_h         4     56.431392  28.282946

So, what was going wrong with your attempt?

It returns an aggregate with different shape from the original dataframe.

In [21]: o2.groupby(['YEAR', 'daytype', 'hourtype'])['option_value'].mean()
Out[21]:
YEAR  daytype  hourtype
2015  SAT      of_h        28.282946
      SUN      of_h        63.019250
      WD       of_h        52.113516
               pk_h        43.126513
Name: option_value, dtype: float64

Or use transform

In [1461]: o2['premium'] = (o2.groupby(['YEAR', 'daytype', 'hourtype'])['option_value']
                              .transform('mean'))

In [1462]: o2
Out[1462]:
   YEAR daytype hourtype  scenario  option_value    premium
0  2015     SAT     of_h         0      0.134499  28.282946
1  2015     SUN     of_h         1     63.019250  63.019250
2  2015      WD     of_h         2     52.113516  52.113516
3  2015      WD     pk_h         3     43.126513  43.126513
4  2015     SAT     of_h         4     56.431392  28.282946

answered Oct 24 '22 00:10

Zero

You can do it the way you intended by tweaking your code in the following way:

o2 = o2.set_index(['YEAR', 'daytype', 'hourtype'])

o2['premium'] = o2.groupby(level=['YEAR', 'daytype', 'hourtype'])['option_value'].mean()

Why the original error? As explained by John Galt, the data coming out of groupby().mean() is not the same shape (length) as the original DataFrame.

Pandas can handle this cleverly if you first start with the 'grouping columns' in the index. Then it knows how to propogate the mean data correctly.

John's solution follows the same logic, because groupby naturally puts the grouping columns in the index during execution.

answered Oct 24 '22 02:10

KieranPC

Related questions
                            
                                Error binding parameter 0: probably unsupported type
                            
                                How to write same-name methods with different parameters in Python [duplicate]
                            
                                Django calling REST API from models or views? [closed]
                            
                                Fastest way to grep multiple values from file in python
                            
                                a Django URLField has fixed max_length as 200 characters
                            
                                ImportError: No module named rest_framework.authtoken
                            
                                Python Sockets Peer to Peer
                            
                                Python unpickling stack underflow
                            
                                TypeError: return arrays must be of ArrayType for a function that uses only floats
                            
                                IPython Notebook widgets for Matplotlib interactivity
                            
                                How to iterate over everything in a python-docx document?
                            
                                How to connect to SFTP through Paramiko with SSH key - Pageant
                            
                                Unable to get SSL client certificate working in Tornado
                            
                                python: Interplay between lib/site-packages/site.py and lib/site.py
                            
                                Upload large file nginx + uwsgi
                            
                                Python: TypeError: list indices must be integers, not str
                            
                                Uniformly shuffle 5 gigabytes of numpy data
                            
                                Set value of excluded field in django ModelForm programmatically
                            
                                Extend user model Django REST framework 3.x.x
                            
                                How to validate URL parameters in Flask app?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Panda dataframe conditional .mean() depending on values in certain column

Tags:

python

pandas

conditional

mean

tpapz

People also ask

2 Answers

Zero

KieranPC

Recent Activity

Donate For Us