Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: how to merge to dataframes on multiple columns?

I have 2 dataframes, df1 and df2.

df1 Contains the information of some interactions between people.

df1
     Name1   Name2 
0    Jack    John   
1    Sarah   Jack   
2    Sarah   Eva    
3    Eva     Tom    
4    Eva     John   

df2 Contains the status of general people and also some people in df1

df2
     Name     Y 
0    Jack     0   
1    John     1   
2    Sarah    0       
3    Tom      1 
4    Laura    0

I would like df2 only for the people that are in df1 (Laura disappears), and for those that are not in df2 keep NaN (i.e. Eva) such as:

df2
     Name     Y 
0    Jack     0   
1    John     1   
2    Sarah    0       
3    Tom      1 
4    Eva     NaN
like image 201
emax Avatar asked Dec 20 '25 00:12

emax


1 Answers

Create a DataFrame on unique values of df1 and map it with df2 as:

df = pd.DataFrame(np.unique(df1.values),columns=['Name'])
df['Y'] = df.Name.map(df2.set_index('Name')['Y'])

print(df)
    Name    Y
0    Eva  NaN
1   Jack  0.0
2   John  1.0
3  Sarah  0.0
4    Tom  1.0

Note : Order is not preserved.

like image 113
Space Impact Avatar answered Dec 22 '25 14:12

Space Impact