I have some DataFrame:
df = pd.DataFrame({'fruit': ['apple', 'apple', 'apple', 'apple', 'orange', 'orange', 'orange', 'orange', 'orange', 'orange'],
'distance': [10, 0, 20, 40, 20, 50 ,70, 90, 110, 130]})
df
fruit distance
0 apple 10
1 apple 0
2 apple 20
3 apple 40
4 orange 20
5 orange 50
6 orange 70
7 orange 90
8 orange 110
9 orange 130
I would like to add a unique ID to each group member sorted by distance, like this:
fruit distance ID
0 apple 10 apple_2
1 apple 0 apple_1
2 apple 20 apple_3
3 apple 40 apple_4
4 orange 20 orange_1
5 orange 50 orange_2
6 orange 70 orange_3
7 orange 130 orange_6
8 orange 110 orange_5
9 orange 90 orange_4
My efforts to sort/groupby/loop have not yet been successful.
Using pandas.DataFrame.groupby.rank
:
df['ID'] = df['fruit'] + "_" + df.groupby("fruit")["distance"].rank().astype(int).astype(str)
print(df)
Output:
fruit distance ID
0 apple 10 apple_2
1 apple 0 apple_1
2 apple 20 apple_3
3 apple 40 apple_4
4 orange 20 orange_1
5 orange 50 orange_2
6 orange 70 orange_3
7 orange 90 orange_4
8 orange 110 orange_5
9 orange 130 orange_6
IIUC,
sort
followed by groupby
and cumsum
and string concatenation.
I'm not sure of your sort at the end ? - but this should work.
nums = (df.sort_values(["fruit", "distance"]).groupby(["fruit"]).cumcount() + 1).astype(str)
df['ID'] = df['fruit'] + '_' + nums
print(df)
fruit distance ID
0 apple 10 apple_2
1 apple 0 apple_1
2 apple 20 apple_3
3 apple 40 apple_4
4 orange 20 orange_1
5 orange 50 orange_2
6 orange 70 orange_3
7 orange 90 orange_4
8 orange 110 orange_5
9 orange 130 orange_6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With