Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to ignore min & max value in group when calculating weighted mean by group in Pandas

I have a dataframe which looks like this

pd.DataFrame({'A': ['C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7', 'C8', 'C9', 'C10'],
  ...:                    'B': ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C'],
  ...:                    'W': [0.5, 0.2, 0.3, 0.2, 0.1, 0.4, 0.3, 0.4, 0.5, 0.1],
  ...:                    'V': [9, 1, 7, 4, 3, 5, 2, 6, 8, 10]})
Out[9]: 
     A  B    W   V
0   C1  A  0.5   9
1   C2  A  0.2   1
2   C3  A  0.3   7
3   C4  B  0.2   4
4   C5  B  0.1   3
5   C6  B  0.4   5
6   C7  B  0.3   2
7   C8  C  0.4   6
8   C9  C  0.5   8
9  C10  C  0.1  10

I want to calculate the weighted mean by group in column 'B' ignoring the min and max value (column 'V') where

column W = weight

column V = value

To calculate the simple mean for each group considering all values I can do this:

df['mean'] = df.groupby('B').apply(lambda x: (x.V * (x.W / x.W.sum())).sum()).reindex(df.B).values
print(df)
     A  B    W   V  mean
0   C1  A  0.5   9   6.8
1   C2  A  0.2   1   6.8
2   C3  A  0.3   7   6.8
3   C4  B  0.2   4   3.7
4   C5  B  0.1   3   3.7
5   C6  B  0.4   5   3.7
6   C7  B  0.3   2   3.7
7   C8  C  0.4   6   7.4
8   C9  C  0.5   8   7.4
9  C10  C  0.1  10   7.4

However, I want to ignore the max and min value in each group to calculate the mean by group. result should look like this

     A  B    W   V  meanNoMinMax
0   C1  A  0.5   9   7.0
1   C2  A  0.2   1   7.0
2   C3  A  0.3   7   7.0
3   C4  B  0.2   4   3.666667
4   C5  B  0.1   3   3.666667
5   C6  B  0.4   5   3.666667
6   C7  B  0.3   2   3.666667
7   C8  C  0.4   6   8.0
8   C9  C  0.5   8   8.0
9  C10  C  0.1  10   8.0

How can I achieve this with 1 line (or very few lines) of code?

Logic

min and max value in V ignored for each group would give following table to calculate mean ignoring min and max value per group

     A  B    W   V
1   C3  A  0.3   7
3   C4  B  0.2   4
4   C5  B  0.1   3
8   C9  C  0.5   8
like image 896
idt_tt Avatar asked Aug 31 '20 18:08

idt_tt


1 Answers

Adding the conditions and fix your code

df['mean'] = df.groupby('B').apply(lambda x: (x.V * (x.W[(x.V!=x.V.max()) & (x.V!=x.V.min())] / x.W[(x.V!=x.V.max()) & (x.V!=x.V.min())].sum())).sum()).reindex(df.B).values
df
Out[293]: 
     A  B    W   V      mean
0   C1  A  0.5   9  7.000000
1   C2  A  0.2   1  7.000000
2   C3  A  0.3   7  7.000000
3   C4  B  0.2   4  3.666667
4   C5  B  0.1   3  3.666667
5   C6  B  0.4   5  3.666667
6   C7  B  0.3   2  3.666667
7   C8  C  0.4   6  8.000000
8   C9  C  0.5   8  8.000000
9  C10  C  0.1  10  8.000000
like image 165
BENY Avatar answered Sep 21 '22 05:09

BENY