Say I have the following dataframe:
>>> df=pd.DataFrame({'category':['a','a','b','b'],
... 'var1':np.random.randint(0,100,4),
... 'var2':np.random.randint(0,100,4),
... 'weights':np.random.randint(0,10,4)})
>>> df
category var1 var2 weights
0 a 37 36 7
1 a 47 20 1
2 b 33 7 6
3 b 16 6 8
I can calculate the weighted average of a 'var1' as such:
>>> Grouped=df.groupby('category')
>>> GetWeightAvg=lambda g: np.average(g['var1'], weights=g['weights'])
>>> Grouped.apply(GetWeightAvg)
category
a 38.250000
b 23.285714
dtype: float64
However I am wondering if there is a way I can write my function and apply it to my grouped object such that I can specify when applying it, which column I want to calculate for (or both). Rather than have 'var1' written into my function, I'd like to be able to specify when applying the function.
Just as I can get an unweighted average of both columns like this:
>>> Grouped[['var1','var2']].mean()
var1 var2
category
a 42.0 28.0
b 24.5 6.5
I'm wondering if there is a parallel way to do that with weighted averages.
How to groupby multiple columns in pandas DataFrame and compute multiple aggregations? groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.
To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.
Approach. We take a data frame or make our own data frame. Define a function to calculate the weighted average by the above-mentioned formula. We need to have at least three items in the data frame i.e index (which may be item-name, date, or any such variable), value, and weight.
You can apply and return both averages:
In [11]: g.apply(lambda x: pd.Series(np.average(x[["var1", "var2"]], weights=x["weights"], axis=0), ["var1", "var2"]))
Out[11]:
var1 var2
category
a 38.250000 34.000000
b 23.285714 6.428571
You could write this slightly cleaner as a function:
In [21]: def weighted(x, cols, w="weights"):
return pd.Series(np.average(x[cols], weights=x[w], axis=0), cols)
In [22]: g.apply(weighted, ["var1", "var2"])
Out[22]:
var1 var2
category
a 38.250000 34.000000
b 23.285714 6.428571
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With