Calculating weighted average by GroupBy.agg and a named aggregation

Tags:

Pandas version 0.25 supports "Named Aggregation" via function agg and namedtuples. You need to pass column, aggregator pairs as the doc describes. It also says:

If your aggregation functions require additional arguments, partially apply them with functools.partial().

I would like to apply this principle to get a weighted average (besides a simple count and average). My input table is

import pandas as pd

t = pd.DataFrame({'bucket':['a', 'a', 'b', 'b', 'b'], 'weight': [2, 3, 1, 4, 3], 
                  'qty': [100, 500, 200, 800, 700]})

and my query fails:

import functools
import numpy as np

t.groupby('bucket').agg(
        NR= ('bucket', 'count'),
        AVG_QTY= ('qty', np.mean),
        W_AVG_QTY= ('qty', functools.partial(np.average, weights='weight'))
   )

with an error message:

TypeError: 1D weights expected when shapes of a and weights differ.

I assume the problem comes from fixing the parameter to be another column instead of a constant? How can I make this work without the workaround that uses apply and a lambda expression that returns a Series?

227

asked Dec 11 '19 16:12

Ferenc Bodon

1 Answers

A weighted average requires 2 separate Series (i.e. a DataFrame). Because of this GroupBy.apply is the correct aggregation method to use. Use pd.concat to join the results.

pd.concat([t.groupby('bucket').agg(NR = ('bucket', 'count'),
                                   AVG_QTY = ('qty', np.mean)),
           (t.groupby('bucket').apply(lambda gp: np.average(gp.qty, weights=gp.weight))
             .rename('W_AVG_QTY'))], 
          axis=1)

#        NR     AVG_QTY  W_AVG_QTY
#bucket                           
#a        2  300.000000      340.0
#b        3  566.666667      687.5

This can be done with agg, assuming your DataFrame has a unique Index, though I can't guarantee it will be very performant given all the slicing. We create our own function that accepts the Series of values and the entire DataFrame. The function then subsets the DataFrame using the Series to obtain the weights for each group.

def my_w_avg(s, df, wcol):
    return np.average(s, weights=df.loc[s.index, wcol])

t.groupby('bucket').agg(
        NR= ('bucket', 'count'),
        AVG_QTY= ('qty', np.mean),
        W_AVG_QTY= ('qty', functools.partial(my_w_avg, df=t, wcol='weight'))
   )

#        NR     AVG_QTY  W_AVG_QTY
#bucket                           
#a        2  300.000000      340.0
#b        3  566.666667      687.5

102

answered Sep 30 '22 03:09

ALollz

Related questions
                            
                                In Python how to convert an `email.message.Message` object into an `email.message.EmailMessage` object
                            
                                How should I pass text/plain data to python's requests.post?
                            
                                Swap keys in list of dicts in python
                            
                                Python pip install mysqlclient
                            
                                No of Pairs of consecutive prime numbers having difference of 6 like (23,29) from 1 to 2 billion
                            
                                ElementClickInterceptedException: Message: element click intercepted: Element <label> is not clickable with Selenium and Python
                            
                                How to fill a polygon using opencv python
                            
                                Schrodinger equation for the hydrogen atom: why is numpy displaying a wrong solution while scipy isn't?
                            
                                How to use Python3.6 tarfile module to read from memory?
                            
                                Why is there a huge difference in performance though time complexity for the two functions below seems to be similar?
                            
                                Keras GaussianNoise layer no effect?
                            
                                How to merge/join empty dataframe with another filled dataframe by equal indices and column names?
                            
                                Unable to parse an exact result from a webpage using requests
                            
                                Fastest algorithm to find the largest palindrome that is the product of 2 numbers with the same number of digits
                            
                                How to mock a imported object with pytest-mock or magicmock
                            
                                Avoiding running top-level module code in unit test
                            
                                How to remove substrings from a python list
                            
                                Using the full PyTorch Transformer Module
                            
                                TypeError: Tensor is unhashable if Tensor equality is enabled. Instead, use tensor.experimental_ref() as the key
                            
                                How do I print values only when they appear more than once in a list in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Calculating weighted average by GroupBy.agg and a named aggregation

Tags:

python

pandas

group-by

functools

Ferenc Bodon

People also ask

1 Answers

ALollz

Recent Activity

Donate For Us