Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add new columns to pandas dataframe based on other dataframe

I'm trying to set a new column (two columns in fact) in a pandas dataframe, with the data comes from other dataframe.

I have the following two dataframes (they are example for this purpose, the original dataframes are so much bigger):

In [116]: df0
Out[116]:     
   A  B  C
0  0  1  0
1  2  3  2
2  4  5  4
3  5  5  5


In [118]: df1
Out[118]: 
   A  D  E
0  2  7  2
1  6  5  5
2  4  3  2
3  0  1  0
4  5  4  6
5  0  1  0

And I want to have a new dataframe (or added to df0, whatever), as:

df2: 
   A  B  C  D  E
0  0  1  0  1  0
1  2  3  2  7  2
2  4  5  4  3  2
3  5  5  5  4  6

As you can see, in the resulting dataframe isn't present the row with A=6 which is present in df1 but not in df0. Also the row with A=0 is duplicated in df1, but not in the result df2.

Actually, I'm having trouble with the selection method. I can do this:

df1.loc[df1['A'].isin(df0['A'])]

But I'm not sure how to apply the part of keep with unique data (remember that df1 can contain duplicated data) and add the two columns to the df2 dataset (or add them to df0). I've search here and I don't know see how to apply something like groupby, or even map.

Any idea?

Thanks!

like image 518
gonzadevelop Avatar asked Sep 06 '16 23:09

gonzadevelop


1 Answers

This is a basic application of merge (docs):

import pandas as pd
df2 = pd.merge(df0,df1, left_index=True, right_index=True)
like image 180
benten Avatar answered Sep 27 '22 01:09

benten