I have 2 dataframes like the following.
df_1
Index Fruit
1 Apple
2 Banana
3 Peach
df_2
Fruit Taste
Apple Tasty
Banana Tasty
Banana Rotten
Peach Rotten
Peach Tasty
Peach Tasty
I want to merge the two dataframes based on Fruit but only keeping the first occurrence of Apple, Banana, and Peach in the second dataframe. The final result should be:
df_output
Index Fruit Taste
1 Apple Tasty
2 Banana Tasty
3 Peach Rotten
Where Fruit, Index, and Taste are column headers. I tried something like df1.merge(df2,how='left',on='Fruit but it created extra rows based on the length of df_2
Thanks.
Use drop_duplicates for first rows:
df = df_1.merge(df_2.drop_duplicates('Fruit'),how='left',on='Fruit')
print (df)
Index Fruit Taste
0 1 Apple Tasty
1 2 Banana Tasty
2 3 Peach Rotten
If want add only one column faster is use map:
s = df_2.drop_duplicates('Fruit').set_index('Fruit')['Taste']
df_1['Taste'] = df_1['Fruit'].map(s)
print (df_1)
Index Fruit Taste
0 1 Apple Tasty
1 2 Banana Tasty
2 3 Peach Rotten
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With