I have data that looks like this:
d = {'id' : [1, 1, 1, 2, 2, 2],
'levels': ['low', 'perfect', 'high', 'low', 'perfect', 'high'],
'value': [1, 10, 13, 2, 10, 13]}
df = pd.DataFrame(d, columns=['id', 'levels', 'value'])
df = df.groupby(['id','levels'])[['value']].mean()
For each [id, levels], I want to find the difference between the value
of the row and the value
of the perfect
row. It would look like this:
id | levels | value | penalty
1 | high | 13 | 3
| low | 1 | 9
| perfect| 10 | 0
2 | high | 13 | 3
| low | 2 | 8
| perfect| 10 | 0
For example, in the first row, you would subtract 13 from the perfect value, which is 10, to get 3.
So how do I make a calculation where I find the perfect
value for each [id, levels]
, and then find the differences?
Select the cross section of dataframe using xs
, then subtract this cross section from the given dataframe on level=0
df['penalty'] = df['value'].sub(df['value'].xs('perfect', level=1)).abs()
value penalty
id levels
1 high 13 3
low 1 9
perfect 10 0
2 high 13 3
low 2 8
perfect 10 0
You can try transform and then subtract and convert to absolute:
val = df.loc[df['levels'].eq('perfect').groupby(df['id']).transform('idxmax'),'value']
df['penalty'] = df['value'].sub(val.to_numpy()).abs()
print(df)
id levels value penalty
0 1 low 1 9
1 1 perfect 10 0
2 1 high 13 3
3 2 low 2 8
4 2 perfect 10 0
5 2 high 13 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With