I have the following table. I want to calculate a weighted average grouped by each date based on the formula below. I can do this using some standard conventional code, but assuming that this data is in a pandas dataframe, is there any easier way to achieve this rather than through iteration?
Date ID wt value w_avg 01/01/2012 100 0.50 60 0.791666667 01/01/2012 101 0.75 80 01/01/2012 102 1.00 100 01/02/2012 201 0.50 100 0.722222222 01/02/2012 202 1.00 80
01/01/2012 w_avg = 0.5 * ( 60/ sum(60,80,100)) + .75 * (80/ sum(60,80,100)) + 1.0 * (100/sum(60,80,100))
01/02/2012 w_avg = 0.5 * ( 100/ sum(100,80)) + 1.0 * ( 80/ sum(100,80))
Approach. We take a data frame or make our own data frame. Define a function to calculate the weighted average by the above-mentioned formula. We need to have at least three items in the data frame i.e index (which may be item-name, date, or any such variable), value, and weight.
Calculate a Weighted Average in Pandas Using NumpyThe numpy library has a function, average() , which allows us to pass in an optional argument to specify weights of values. The function will take an array into the argument a= , and another array for weights under the argument weights= .
To find a weighted average, multiply each number by its weight, then add the results. If the weights don't add up to one, find the sum of all the variables multiplied by their weight, then divide by the sum of the weights.
To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.
Let's first create the example pandas dataframe:
In [1]: import numpy as np In [2]: import pandas as pd In [3]: index = pd.Index(['01/01/2012','01/01/2012','01/01/2012','01/02/2012','01/02/2012'], name='Date') In [4]: df = pd.DataFrame({'ID':[100,101,102,201,202],'wt':[.5,.75,1,.5,1],'value':[60,80,100,100,80]},index=index)
Then, the average of 'wt' weighted by 'value' and grouped by the index is obtained as:
In [5]: df.groupby(df.index).apply(lambda x: np.average(x.wt, weights=x.value)) Out[5]: Date 01/01/2012 0.791667 01/02/2012 0.722222 dtype: float64
Alternatively, one can also define a function:
In [5]: def grouped_weighted_avg(values, weights, by): ...: return (values * weights).groupby(by).sum() / weights.groupby(by).sum() In [6]: grouped_weighted_avg(values=df.wt, weights=df.value, by=df.index) Out[6]: Date 01/01/2012 0.791667 01/02/2012 0.722222 dtype: float64
I think I would do this with two groupbys.
First to calculate the "weighted average":
In [11]: g = df.groupby('Date') In [12]: df.value / g.value.transform("sum") * df.wt Out[12]: 0 0.125000 1 0.250000 2 0.416667 3 0.277778 4 0.444444 dtype: float64
If you set this as a column, you can groupby over it:
In [13]: df['wa'] = df.value / g.value.transform("sum") * df.wt
Now the sum of this column is the desired:
In [14]: g.wa.sum() Out[14]: Date 01/01/2012 0.791667 01/02/2012 0.722222 Name: wa, dtype: float64
or potentially:
In [15]: g.wa.transform("sum") Out[15]: 0 0.791667 1 0.791667 2 0.791667 3 0.722222 4 0.722222 Name: wa, dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With