Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate weighted average using a pandas/dataframe

I have the following table. I want to calculate a weighted average grouped by each date based on the formula below. I can do this using some standard conventional code, but assuming that this data is in a pandas dataframe, is there any easier way to achieve this rather than through iteration?

Date        ID      wt      value   w_avg 01/01/2012  100     0.50    60      0.791666667 01/01/2012  101     0.75    80 01/01/2012  102     1.00    100 01/02/2012  201     0.50    100     0.722222222 01/02/2012  202     1.00    80 

01/01/2012 w_avg = 0.5 * ( 60/ sum(60,80,100)) + .75 * (80/ sum(60,80,100)) + 1.0 * (100/sum(60,80,100))

01/02/2012 w_avg = 0.5 * ( 100/ sum(100,80)) + 1.0 * ( 80/ sum(100,80))

like image 890
mike01010 Avatar asked Oct 05 '14 18:10

mike01010


People also ask

How do you do a weighted average on pandas?

Approach. We take a data frame or make our own data frame. Define a function to calculate the weighted average by the above-mentioned formula. We need to have at least three items in the data frame i.e index (which may be item-name, date, or any such variable), value, and weight.

How do you find the weighted average in Python?

Calculate a Weighted Average in Pandas Using NumpyThe numpy library has a function, average() , which allows us to pass in an optional argument to specify weights of values. The function will take an array into the argument a= , and another array for weights under the argument weights= .

How can you calculate a weighted average?

To find a weighted average, multiply each number by its weight, then add the results. If the weights don't add up to one, find the sum of all the variables multiplied by their weight, then divide by the sum of the weights.

How do calculate the average value for column in a pandas DataFrame?

To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.


2 Answers

Let's first create the example pandas dataframe:

In [1]: import numpy as np  In [2]: import pandas as pd  In [3]: index = pd.Index(['01/01/2012','01/01/2012','01/01/2012','01/02/2012','01/02/2012'], name='Date')  In [4]: df = pd.DataFrame({'ID':[100,101,102,201,202],'wt':[.5,.75,1,.5,1],'value':[60,80,100,100,80]},index=index) 

Then, the average of 'wt' weighted by 'value' and grouped by the index is obtained as:

In [5]: df.groupby(df.index).apply(lambda x: np.average(x.wt, weights=x.value)) Out[5]:  Date 01/01/2012    0.791667 01/02/2012    0.722222 dtype: float64 

Alternatively, one can also define a function:

In [5]: def grouped_weighted_avg(values, weights, by):    ...:     return (values * weights).groupby(by).sum() / weights.groupby(by).sum()  In [6]: grouped_weighted_avg(values=df.wt, weights=df.value, by=df.index) Out[6]:  Date 01/01/2012    0.791667 01/02/2012    0.722222 dtype: float64 
like image 141
kadee Avatar answered Sep 19 '22 06:09

kadee


I think I would do this with two groupbys.

First to calculate the "weighted average":

In [11]: g = df.groupby('Date')  In [12]: df.value / g.value.transform("sum") * df.wt Out[12]: 0    0.125000 1    0.250000 2    0.416667 3    0.277778 4    0.444444 dtype: float64 

If you set this as a column, you can groupby over it:

In [13]: df['wa'] = df.value / g.value.transform("sum") * df.wt 

Now the sum of this column is the desired:

In [14]: g.wa.sum() Out[14]: Date 01/01/2012    0.791667 01/02/2012    0.722222 Name: wa, dtype: float64 

or potentially:

In [15]: g.wa.transform("sum") Out[15]: 0    0.791667 1    0.791667 2    0.791667 3    0.722222 4    0.722222 Name: wa, dtype: float64 
like image 34
Andy Hayden Avatar answered Sep 22 '22 06:09

Andy Hayden