I have two dataframes,
df1 = pd.DataFrame({'A': ['A1', 'A1', 'A2', 'A3'],
'B': ['121', '345', '123', '146'],
'C': ['K0', 'K1', 'K0', 'K1']})
df2 = pd.DataFrame({'A': ['A1', 'A3'],
'BB': ['B0', 'B3'],
'CC': ['121', '345'],
'DD': ['D0', 'D1']})
Now I need to get the similiar rows from column A and B from df1 and column A and CC from df2. And so I tried possible merge options, such as:
both_DFS=pd.merge(df1,df2, how='left',left_on=['A','B'],right_on=['A','CC'])
and this will not give me row information from df2 dataframe which is what I needed. Meaning, I have all column names from df2 but the rows are just empty or Nan.
And then I tried:
Both_DFs=pd.merge(df1,df2, how='left',left_on=['A','B'],right_on=['A','CC'])[['A','B','CC']]
And this give me error as,
KeyError: "['B'] not in index"
I am aiming to have a merged Dataframe with all columns from both df1 and df2. Any suggestions would be great
Desired output:
Both_DFs
A B C BB CC DD
0 A1 121 K0 B0 121 D0
So in my data frames (df1 and df2), only one row has exact match for both columns of interest. That is, Column A and B from df1 has only one row matching exactly to rows in columns A and CC in df2
You can also use join
with default left join or merge
, last if necessary remove rows with NaN
s by dropna
:
print (df1.join(df2.set_index('A'), on='A').dropna())
A B C BB CC DD
0 A1 123 K0 B0 121 D0
1 A1 345 K1 B0 121 D0
3 A3 146 K1 B3 345 D1
print (pd.merge(df1, df2, on='A', how='left').dropna())
A B C BB CC DD
0 A1 123 K0 B0 121 D0
1 A1 345 K1 B0 121 D0
3 A3 146 K1 B3 345 D1
EDIT:
I think you need inner join
(by default, so on='inner'
can be omit):
Both_DFs = pd.merge(df1,df2, left_on=['A','B'],right_on=['A','CC'])
print (Both_DFs)
A B C BB CC DD
0 A1 121 K0 B0 121 D0
Well, if you declare column A
as index, it works:
Both_DFs = pd.merge(df1.set_index('A', drop=True),df2.set_index('A', drop=True), how='left',left_on=['B'],right_on=['CC'], left_index=True, right_index=True).dropna().reset_index()
This results in:
A B C BB CC DD
0 A1 123 K0 B0 121 D0
1 A1 345 K1 B0 121 D0
2 A3 146 K1 B3 345 D1
EDIT
You just needed:
Both_DFs = pd.merge(df1,df2, how='left',left_on=['A','B'],right_on=['A','CC']).dropna()
Which gives:
A B C BB CC DD
0 A1 121 K0 B0 121 D0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With