Say I have the following dataframe: <pre class="prettyprint"><code>>>> df=pd.DataFrame({'category':['a','a','b','b'], ... 'var1':np.random.randint(0,100,4), ... 'var2':np.random.randint(0,100,4), ... 'weights':np.random.randint(0,10,4)}) >>> df category var1 var2 weights 0 a 37 36 7 1 a 47 20 1 2 b 33 7 6 3 b 16 6 8 </code></pre> I can calculate the weighted average of a 'var1' as such: <pre class="prettyprint"><code>>>> Grouped=df.groupby('category') >>> GetWeightAvg=lambda g: np.average(g['var1'], weights=g['weights']) >>> Grouped.apply(GetWeightAvg) category a 38.250000 b 23.285714 dtype: float64 </code></pre> However I am wondering if there is a way I can write my function and apply it to my grouped object such that I can specify when applying it, which column I want to calculate for (or both). Rather than have 'var1' written into my function, I'd like to be able to specify when applying the function. Just as I can get an unweighted average of both columns like this: <pre class="prettyprint"><code>>>> Grouped[['var1','var2']].mean() var1 var2 category a 42.0 28.0 b 24.5 6.5 </code></pre> I'm wondering if there is a parallel way to do that with weighted averages.

You can apply and return both averages: <pre class="prettyprint"><code>In [11]: g.apply(lambda x: pd.Series(np.average(x[["var1", "var2"]], weights=x["weights"], axis=0), ["var1", "var2"])) Out[11]: var1 var2 category a 38.250000 34.000000 b 23.285714 6.428571 </code></pre> You could write this slightly cleaner as a function: <pre class="prettyprint"><code>In [21]: def weighted(x, cols, w="weights"): return pd.Series(np.average(x[cols], weights=x[w], axis=0), cols) In [22]: g.apply(weighted, ["var1", "var2"]) Out[22]: var1 var2 category a 38.250000 34.000000 b 23.285714 6.428571 </code></pre>

Pandas Group Weighted Average of Multiple Columns

Say I have the following dataframe:

>>> df=pd.DataFrame({'category':['a','a','b','b'],
... 'var1':np.random.randint(0,100,4),
... 'var2':np.random.randint(0,100,4),
... 'weights':np.random.randint(0,10,4)})
>>> df
  category  var1  var2  weights
0        a    37    36        7
1        a    47    20        1
2        b    33     7        6
3        b    16     6        8

I can calculate the weighted average of a 'var1' as such:

>>> Grouped=df.groupby('category')
>>> GetWeightAvg=lambda g: np.average(g['var1'], weights=g['weights'])
>>> Grouped.apply(GetWeightAvg)
category
a    38.250000
b    23.285714
dtype: float64

However I am wondering if there is a way I can write my function and apply it to my grouped object such that I can specify when applying it, which column I want to calculate for (or both). Rather than have 'var1' written into my function, I'd like to be able to specify when applying the function.

Just as I can get an unweighted average of both columns like this:

>>> Grouped[['var1','var2']].mean()
          var1  var2
category            
a         42.0  28.0
b         24.5   6.5

I'm wondering if there is a parallel way to do that with weighted averages.

Can you Groupby multiple columns in pandas?

How to groupby multiple columns in pandas DataFrame and compute multiple aggregations? groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.

How do you average multiple columns in pandas?

To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.

How do you calculate weighted average in pandas?

Approach. We take a data frame or make our own data frame. Define a function to calculate the weighted average by the above-mentioned formula. We need to have at least three items in the data frame i.e index (which may be item-name, date, or any such variable), value, and weight.

You can apply and return both averages:

In [11]: g.apply(lambda x: pd.Series(np.average(x[["var1", "var2"]], weights=x["weights"], axis=0), ["var1", "var2"]))
Out[11]:
               var1       var2
category
a         38.250000  34.000000
b         23.285714   6.428571

You could write this slightly cleaner as a function:

In [21]: def weighted(x, cols, w="weights"):
             return pd.Series(np.average(x[cols], weights=x[w], axis=0), cols)

In [22]: g.apply(weighted, ["var1", "var2"])
Out[22]:
               var1       var2
category
a         38.250000  34.000000
b         23.285714   6.428571

Pandas Group Weighted Average of Multiple Columns

Tags:

python

pandas

AJG519

People also ask

1 Answers

Andy Hayden

Recent Activity

Donate For Us

Pandas Group Weighted Average of Multiple Columns

Tags:

python

pandas

AJG519

People also ask

1 Answers

Andy Hayden

Related questions

Recent Activity

Donate For Us