Let's say we have the following dataset:
import pandas as pd
data = [('apple', 'red', 155), ('apple', 'green', 102), ('apple', 'iphone', 48),
('tomato', 'red', 175), ('tomato', 'ketchup', 96), ('tomato', 'gun', 12)]
df = pd.DataFrame(data)
df.columns = ['word', 'rel_word', 'weight']
I would like to recompute weights, so that they sum up to 1.0 within each group (apple, tomato in the example) and keep related weights as is (e.g. apple/red to apple/green still should be 155/102).
Use transform
- faster than apply
and lookup
In [3849]: df['weight'] / df.groupby('word')['weight'].transform('sum')
Out[3849]:
0 0.508197
1 0.334426
2 0.157377
3 0.618375
4 0.339223
5 0.042403
Name: weight, dtype: float64
In [3850]: df['norm_w'] = df['weight'] / df.groupby('word')['weight'].transform('sum')
In [3851]: df
Out[3851]:
word rel_word weight norm_w
0 apple red 155 0.508197
1 apple green 102 0.334426
2 apple iphone 48 0.157377
3 tomato red 175 0.618375
4 tomato ketchup 96 0.339223
5 tomato gun 12 0.042403
Or,
In [3852]: df.groupby('word')['weight'].transform(lambda x: x/x.sum())
Out[3852]:
0 0.508197
1 0.334426
2 0.157377
3 0.618375
4 0.339223
5 0.042403
Name: weight, dtype: float64
Timings
In [3862]: df.shape
Out[3862]: (12000, 4)
In [3864]: %timeit df['weight'] / df.groupby('word')['weight'].transform('sum')
100 loops, best of 3: 2.44 ms per loop
In [3866]: %timeit df.groupby('word')['weight'].transform(lambda x: x/x.sum())
100 loops, best of 3: 5.16 ms per loop
In [3868]: %%timeit
...: group_weights = df.groupby('word').aggregate(sum)
...: df.apply(lambda row: row['weight']/group_weights.loc[row['word']][0],axis=1)
1 loop, best of 3: 2.5 s per loop
You can use groupby
to calculate the total weight of each group, and then apply a normalization lambda function to each row:
group_weights = df.groupby('word').aggregate(sum)
df['normalized_weights'] = df.apply(lambda row: row['weight']/group_weights.loc[row['word']][0],axis=1)
Output:
word rel_word weight normalized_weights
0 apple red 155 0.508197
1 apple green 102 0.334426
2 apple iphone 48 0.157377
3 tomato red 175 0.618375
4 tomato ketchup 96 0.339223
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With