Trying to take a df and create a new column thats based on the difference between the Value in a group and that groups max:
Group Value
A 4
A 6
A 10
B 5
B 8
B 11
End up with a new column "from_max"
from_max
6
4
0
6
3
0
I tried this but a ValueError:
df['from_max'] = df.groupby(['Group']).apply(lambda x: x['Value'].max() - x['Value'])
Thanks in Advance
Option 1
vectorised groupby
+ transform
df['from_max'] = df.groupby('Group').Value.transform('max') - df.Value
df
Group Value from_max
0 A 4 6
1 A 6 4
2 A 10 0
3 B 5 6
4 B 8 3
5 B 11 0
Option 2
index aligned subtraction
df['from_max'] = (df.groupby('Group').Value.max() - df.set_index('Group').Value).values
df
Group Value from_max
0 A 4 6
1 A 6 4
2 A 10 0
3 B 5 6
4 B 8 3
5 B 11 0
I think need GroupBy.transform
for return Series
with same size as original DataFrame
:
df['from_max'] = df.groupby(['Group'])['Value'].transform(lambda x: x.max() - x)
Or:
df['from_max'] = df.groupby(['Group'])['Value'].transform(max) - df['Value']
Alternative is Series.map
by aggregate max
:
df['from_max'] = df['Group'].map(df.groupby(['Group'])['Value'].max()) - df['Value']
print (df)
Group Value from_max
0 A 4 6
1 A 6 4
2 A 10 0
3 B 5 6
4 B 8 3
5 B 11 0
Using reindex
df['From_Max']=df.groupby('Group').Value.max().reindex(df.Group).values-df.Value.values
df
Out[579]:
Group Value From_Max
0 A 4 6
1 A 6 4
2 A 10 0
3 B 5 6
4 B 8 3
5 B 11 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With