Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouped By, Weighted, Column Averages in Pandas

Tags:

So I have two value columns and two weight columns in a Pandas DataFrame, and I want to generate a third column that is the grouped by, weighted, average of those two columns.

So for:

df = pd.DataFrame({'category':['a','a','b','b'],
  'var1':np.random.randint(0,100,4),
  'var2':np.random.randint(0,100,4),
  'weights1':np.random.random(4),
  'weights2':np.random.random(4)})
df
  category  var1  var2  weights1  weights2
0        a    84    45  0.955234  0.729862
1        a    49     5  0.225470  0.159662
2        b    77    95  0.957212  0.991960
3        b    27    65  0.491877  0.195680

I'd want to accomplish:

df
  category  var1  var2  weights1  weights2    average
0        a    84    45  0.955234  0.729862  67.108023
1        a    49     5  0.225470  0.159662  30.759124
2        b    77    95  0.957212  0.991960  86.160443
3        b    27    65  0.491877  0.195680  37.814851

I've already accomplished this using just arithmetic operators like this:

df['average'] = df.groupby('category', group_keys=False) \
  .apply(lambda g: (g.weights1 * g.var1 + g.weights2 * g.var2) / (g.weights1 + g.weights2))

But I want to generalize it to using numpy.average, so I could for example take the weighted average of 3 columns or more.

I'm trying something like this, but it doesn't seem to work:

df['average'] = df.groupby('category', group_keys=False) \
  .apply(lambda g: np.average([g.var1, g.var2], axis=0, weights=[g.weights1, g.weights2]))

returning

TypeError: incompatible index of inserted column with frame index

Can anyone help me do this?

like image 701
jtanman Avatar asked Apr 11 '19 23:04

jtanman


1 Answers

I don't even think you need groupby here. Notice, this matches the output with apply + lambda.

Try this:

col=df.drop('category',1)
s=col.groupby(col.columns.str.findall(r'\d+').str[0],axis=1).prod().sum(1)
s/df.filter(like='weight').sum(1)
Out[33]: 
0    67.108014
1    30.759168
2    86.160444
3    37.814871
dtype: float64
like image 182
BENY Avatar answered Nov 15 '22 05:11

BENY