Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining different parts of two rows in a dataframe using Pandas

I have a large (>10000 rows) dataframe like below, and I have to combine the rows with the same name by col_1, col_2, col_3 of group a and col_4, col_5 of group b or group c (b & c won't exist at the same time).

I could just find online how to combine (add up) every column of the rows, put not in different parts.

Original dataframe

name group col_1 col_2 col_3 col_4 col_5
ann a 1 NaN 2 3 NaN
ann b 1 NaN NaN 2 3
ben a 0 1 2 1 NaN
ben c 0 NaN NaN 3 NaN
cat a 2 3 NaN NaN NaN

Desired result

name col_1 col_2 col_3 col_4 col_5
ann 1 NaN 2 2 3
ben 0 1 2 3 NaN
cat 2 3 NaN NaN NaN
like image 773
blackcat Avatar asked Nov 24 '25 15:11

blackcat


1 Answers

A bit of an ad-hoc approach, but one way you could do this is by sorting your dataframe by name and group, then split up your df into 2 parts (i.e. col1/2/3 for group a), perform a groupby and use first() on the first df and last() on the second, and then put back together using concat:

import pandas as pd

df.sort_values(['name','group'],ascending=True,inplace=True)

one = df.iloc[:,:5]
two = pd.concat([df.iloc[:,:2],df[['col_4','col_5']]],axis=1)

def my_func(df1,df2,col):
    return pd.concat([df1.groupby(col).first(),
                      df2.groupby(col).last()],
                      axis=1).drop('group',axis=1)
 
res = my_func(one,two,'name')

Which prints:

print(res)
      col_1  col_2  col_3  col_4  col_5
name                                   
ann       1    NaN   2.00   2.00   3.00
ben       0   1.00   2.00   3.00    NaN
cat       2   3.00    NaN    NaN    NaN

If i'm not mistaken this gets you what you need.

like image 140
sophocles Avatar answered Nov 27 '25 03:11

sophocles



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!