Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Groupby with weight

Given the following dataframe:

import pandas as pd
d=pd.DataFrame({'Age':[18,20,20,56,56],'Race':['A','A','A','B','B'],'Response':[3,2,5,6,2],'Weight':[0.5,0.5,0.5,1.2,1.2]})
d
    Age     Race    Response    Weight
0   18      A       3           0.5
1   20      A       2           0.5
2   20      A       5           0.5
3   56      B       6           1.2
4   56      B       2           1.2

I know that I can apply a group-by to get the count by age and race like this:

d.groupby(['Age','Race'])['Response'].count()
Age  Race
18   A       1
20   A       2
56   B       2
Name: Response, dtype: int64

But I'd like to use the "Weight" column to weight the cases such that the first 3 rows will count as 0.5 instead of 1 each and the last two will count as 1.2. So, if grouping by age and race, I should have the following:

Age  Race
18   A       0.5
20   A       1
56   B       2.4
Name: Response, dtype: int64

This is similar to using the "Weight Cases" option in SPSS. I know it's possible in R and I've seen a promising library in Python (though the current build is failing) here.

And PySal (not sure if it's applicable here)

...but I'm wondering if it can just be done somehow in the group-by.

like image 722
Dance Party Avatar asked Feb 23 '26 20:02

Dance Party


1 Answers

If I understand correctly, you're just looking for .sum() with the weights.

d.groupby(['Age', 'Race']).Weight.sum()

## Age  Race
## 18   A       0.5
## 20   A       1.0
## 56   B       2.4
## Name: Weight, dtype: float64
like image 96
miradulo Avatar answered Feb 25 '26 09:02

miradulo