Combining different parts of two rows in a dataframe using Pandas

Question

I have a large (>10000 rows) dataframe like below, and I have to combine the rows with the same name by col_1, col_2, col_3 of group a and col_4, col_5 of group b or group c (b & c won't exist at the same time).

I could just find online how to combine (add up) every column of the rows, put not in different parts.

Original dataframe

name	group	col_1	col_2	col_3	col_4	col_5
ann	a	1	NaN	2	3	NaN
ann	b	1	NaN	NaN	2	3
ben	a	0	1	2	1	NaN
ben	c	0	NaN	NaN	3	NaN
cat	a	2	3	NaN	NaN	NaN

Desired result

name	col_1	col_2	col_3	col_4	col_5
ann	1	NaN	2	2	3
ben	0	1	2	3	NaN
cat	2	3	NaN	NaN	NaN

sophocles · Accepted Answer

A bit of an ad-hoc approach, but one way you could do this is by sorting your dataframe by name and group, then split up your df into 2 parts (i.e. col1/2/3 for group a), perform a groupby and use first() on the first df and last() on the second, and then put back together using concat:

import pandas as pd

df.sort_values(['name','group'],ascending=True,inplace=True)

one = df.iloc[:,:5]
two = pd.concat([df.iloc[:,:2],df[['col_4','col_5']]],axis=1)

def my_func(df1,df2,col):
    return pd.concat([df1.groupby(col).first(),
                      df2.groupby(col).last()],
                      axis=1).drop('group',axis=1)
 
res = my_func(one,two,'name')

Which prints:

print(res)
      col_1  col_2  col_3  col_4  col_5
name                                   
ann       1    NaN   2.00   2.00   3.00
ben       0   1.00   2.00   3.00    NaN
cat       2   3.00    NaN    NaN    NaN

If i'm not mistaken this gets you what you need.

Combining different parts of two rows in a dataframe using Pandas

Tags:

python

pandas

dataframe

blackcat

1 Answers

sophocles

Recent Activity

Donate For Us

name	group	col_1	col_2	col_3	col_4	col_5
ann	a	1	NaN	2	3	NaN
ann	b	1	NaN	NaN	2	3
ben	a	0	1	2	1	NaN
ben	c	0	NaN	NaN	3	NaN
cat	a	2	3	NaN	NaN	NaN

name	group	col_1	col_2	col_3	col_4	col_5
ann	a	1	NaN	2	3	NaN
ann	b	1	NaN	NaN	2	3
ben	a	0	1	2	1	NaN
ben	c	0	NaN	NaN	3	NaN
cat	a	2	3	NaN	NaN	NaN

Combining different parts of two rows in a dataframe using Pandas

Tags:

python

pandas

dataframe

blackcat

1 Answers

sophocles

Related questions

Recent Activity

Donate For Us

name	group	col_1	col_2	col_3	col_4	col_5
ann	a	1	NaN	2	3	NaN
ann	b	1	NaN	NaN	2	3
ben	a	0	1	2	1	NaN
ben	c	0	NaN	NaN	3	NaN
cat	a	2	3	NaN	NaN	NaN