Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge two different dataframes on different column names [duplicate]

I have two dataframes,

df1 = pd.DataFrame({'A': ['A1', 'A1', 'A2', 'A3'],
                     'B': ['121', '345', '123', '146'],
                     'C': ['K0', 'K1', 'K0', 'K1']})

df2 = pd.DataFrame({'A': ['A1', 'A3'],
                      'BB': ['B0', 'B3'],
                      'CC': ['121', '345'],
                      'DD': ['D0', 'D1']})

Now I need to get the similiar rows from column A and B from df1 and column A and CC from df2. And so I tried possible merge options, such as:

both_DFS=pd.merge(df1,df2, how='left',left_on=['A','B'],right_on=['A','CC'])

and this will not give me row information from df2 dataframe which is what I needed. Meaning, I have all column names from df2 but the rows are just empty or Nan.

And then I tried:

Both_DFs=pd.merge(df1,df2, how='left',left_on=['A','B'],right_on=['A','CC'])[['A','B','CC']]

And this give me error as,

KeyError: "['B'] not in index"

I am aiming to have a merged Dataframe with all columns from both df1 and df2. Any suggestions would be great

Desired output:

 Both_DFs
    A   B   C   BB  CC  DD
0   A1  121 K0  B0  121 D0

So in my data frames (df1 and df2), only one row has exact match for both columns of interest. That is, Column A and B from df1 has only one row matching exactly to rows in columns A and CC in df2

like image 547
user1017373 Avatar asked May 02 '17 10:05

user1017373


2 Answers

You can also use join with default left join or merge, last if necessary remove rows with NaNs by dropna:

print (df1.join(df2.set_index('A'), on='A').dropna())
    A    B   C  BB   CC  DD
0  A1  123  K0  B0  121  D0
1  A1  345  K1  B0  121  D0
3  A3  146  K1  B3  345  D1

print (pd.merge(df1, df2, on='A', how='left').dropna())
    A    B   C  BB   CC  DD
0  A1  123  K0  B0  121  D0
1  A1  345  K1  B0  121  D0
3  A3  146  K1  B3  345  D1

EDIT:

I think you need inner join (by default, so on='inner' can be omit):

Both_DFs = pd.merge(df1,df2, left_on=['A','B'],right_on=['A','CC'])
print (Both_DFs)
    A    B   C  BB   CC  DD
0  A1  121  K0  B0  121  D0
like image 130
jezrael Avatar answered Oct 14 '22 02:10

jezrael


Well, if you declare column A as index, it works:

Both_DFs = pd.merge(df1.set_index('A', drop=True),df2.set_index('A', drop=True), how='left',left_on=['B'],right_on=['CC'], left_index=True, right_index=True).dropna().reset_index()

This results in:

    A    B   C  BB   CC  DD
0  A1  123  K0  B0  121  D0
1  A1  345  K1  B0  121  D0
2  A3  146  K1  B3  345  D1

EDIT

You just needed:

Both_DFs = pd.merge(df1,df2, how='left',left_on=['A','B'],right_on=['A','CC']).dropna()

Which gives:

    A    B   C  BB   CC  DD
0  A1  121  K0  B0  121  D0
like image 13
zipa Avatar answered Oct 14 '22 04:10

zipa