I have the following table. I want to calculate a weighted average grouped by each date based on the formula below. I can do this using some standard conventional code, but assuming that this data is in a pandas dataframe, is there any easier way to achieve this rather than through iteration? <pre class="prettyprint"><code>Date ID wt value w_avg 01/01/2012 100 0.50 60 0.791666667 01/01/2012 101 0.75 80 01/01/2012 102 1.00 100 01/02/2012 201 0.50 100 0.722222222 01/02/2012 202 1.00 80 </code></pre> <blockquote> 01/01/2012 w_avg = 0.5 * ( 60/ sum(60,80,100)) + .75 * (80/ sum(60,80,100)) + 1.0 * (100/sum(60,80,100)) 01/02/2012 w_avg = 0.5 * ( 100/ sum(100,80)) + 1.0 * ( 80/ sum(100,80)) </blockquote>

I think I would do this with two groupbys. First to calculate the "weighted average": <pre class="prettyprint"><code>In [11]: g = df.groupby('Date') In [12]: df.value / g.value.transform("sum") * df.wt Out[12]: 0 0.125000 1 0.250000 2 0.416667 3 0.277778 4 0.444444 dtype: float64 </code></pre> If you set this as a column, you can groupby over it: <pre class="prettyprint"><code>In [13]: df['wa'] = df.value / g.value.transform("sum") * df.wt </code></pre> Now the sum of this column is the desired: <pre class="prettyprint"><code>In [14]: g.wa.sum() Out[14]: Date 01/01/2012 0.791667 01/02/2012 0.722222 Name: wa, dtype: float64 </code></pre> or potentially: <pre class="prettyprint"><code>In [15]: g.wa.transform("sum") Out[15]: 0 0.791667 1 0.791667 2 0.791667 3 0.722222 4 0.722222 Name: wa, dtype: float64 </code></pre>

Calculate weighted average using a pandas/dataframe

Tags:

python

pandas

numpy

I have the following table. I want to calculate a weighted average grouped by each date based on the formula below. I can do this using some standard conventional code, but assuming that this data is in a pandas dataframe, is there any easier way to achieve this rather than through iteration?

Date        ID      wt      value   w_avg 01/01/2012  100     0.50    60      0.791666667 01/01/2012  101     0.75    80 01/01/2012  102     1.00    100 01/02/2012  201     0.50    100     0.722222222 01/02/2012  202     1.00    80

01/01/2012 w_avg = 0.5 * ( 60/ sum(60,80,100)) + .75 * (80/ sum(60,80,100)) + 1.0 * (100/sum(60,80,100))

01/02/2012 w_avg = 0.5 * ( 100/ sum(100,80)) + 1.0 * ( 80/ sum(100,80))

890

asked Oct 05 '14 18:10

mike01010

2 Answers

Let's first create the example pandas dataframe:

In [1]: import numpy as np  In [2]: import pandas as pd  In [3]: index = pd.Index(['01/01/2012','01/01/2012','01/01/2012','01/02/2012','01/02/2012'], name='Date')  In [4]: df = pd.DataFrame({'ID':[100,101,102,201,202],'wt':[.5,.75,1,.5,1],'value':[60,80,100,100,80]},index=index)

Then, the average of 'wt' weighted by 'value' and grouped by the index is obtained as:

In [5]: df.groupby(df.index).apply(lambda x: np.average(x.wt, weights=x.value)) Out[5]:  Date 01/01/2012    0.791667 01/02/2012    0.722222 dtype: float64

Alternatively, one can also define a function:

In [5]: def grouped_weighted_avg(values, weights, by):    ...:     return (values * weights).groupby(by).sum() / weights.groupby(by).sum()  In [6]: grouped_weighted_avg(values=df.wt, weights=df.value, by=df.index) Out[6]:  Date 01/01/2012    0.791667 01/02/2012    0.722222 dtype: float64

141

answered Sep 19 '22 06:09

kadee

I think I would do this with two groupbys.

First to calculate the "weighted average":

In [11]: g = df.groupby('Date')  In [12]: df.value / g.value.transform("sum") * df.wt Out[12]: 0    0.125000 1    0.250000 2    0.416667 3    0.277778 4    0.444444 dtype: float64

If you set this as a column, you can groupby over it:

In [13]: df['wa'] = df.value / g.value.transform("sum") * df.wt

Now the sum of this column is the desired:

In [14]: g.wa.sum() Out[14]: Date 01/01/2012    0.791667 01/02/2012    0.722222 Name: wa, dtype: float64

or potentially:

In [15]: g.wa.transform("sum") Out[15]: 0    0.791667 1    0.791667 2    0.791667 3    0.722222 4    0.722222 Name: wa, dtype: float64

answered Sep 22 '22 06:09

Andy Hayden

Related questions
                            
                                How to overload Python's __bool__ method? [duplicate]
                            
                                Change default Python version from 2.4 to 2.6
                            
                                conda command will prompt error: "Bad Interpreter: No such file or directory"
                            
                                Python round to next highest power of 10
                            
                                Python Dependency Injection Framework
                            
                                Python lambda with if but without else
                            
                                In Pandas how do I convert a string of date strings to datetime objects and put them in a DataFrame?
                            
                                How do I install Jupyter notebook on an Android device?
                            
                                Why is this regular expression so slow in Java? [duplicate]
                            
                                How do I extend the Django Group model?
                            
                                How do I ignore PyCharm configuration files in a git repository?
                            
                                how to do a left,right and mid of a string in a pandas dataframe
                            
                                Actions triggered by field change in Django
                            
                                How to run a python file using cron jobs
                            
                                In OpenCV (Python), why am I getting 3 channel images from a grayscale image?
                            
                                Extract src attribute from img tag using BeautifulSoup
                            
                                How to add delta to python datetime.time?
                            
                                Switch Python Version for Vim & Syntastic
                            
                                How to write 2**n - 1 as a recursive function?
                            
                                How to efficiently get the mean of the elements in two list of lists in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With