Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge Duplicates based on column?

Here's my situation -

In[9]: df
Out[9]: 
    fruit  val1  val2
0  Orange     1     1
1  orANGE     2     2
2   apple     3     3
3   APPLE     4     4
4   mango     5     5
5   appLE     6     6

In[10]: type(df)
Out[10]: pandas.core.frame.DataFrame

How do remove case-insensitive duplicates such that resulting fruit will be all lower with val1 as sum of each val1s and val2 as sum of eachval2s

Expected result:

  fruit    val1 val2
0 orange    3    3
1 apple     13   13
2 mango     5    5 
like image 491
ComputerFellow Avatar asked Feb 15 '23 05:02

ComputerFellow


1 Answers

In two steps:

df['fruit'] = df['fruit'].map(lambda x: x.lower())

res = df.groupby('fruit').sum()

res    
#         val1  val2
# fruit             
# apple     13    13
# mango      5     5
# orange     3     3

And to recover your structure:

res.reset_index()

as per the comment, the lower casing can be accomplished in a more straight forward way like this:

df['fruit'] = df['fruit'].str.lower()
like image 136
Justin Avatar answered Feb 17 '23 09:02

Justin