df_a and df_b are two dataframes that looks like following
df_a
A   B       C      D     E
x1  Apple   0.3   0.9    0.6
x1  Orange  0.1   0.5    0.2
x2  Apple   0.2   0.2    0.1
x2  Orange  0.3   0.4    0.9
x2  Mango   0.1   0.2    0.3
x3  Orange  0.3   0.1    0.2
df_b
A   B_new   F    
x1  Apple   0.3  
x1  Mango   0.2  
x1  Orange  0.1   
x2  Apple   0.2   
x2  Orange  0.3     
x2  Mango   0.1  
x3  Orange  0.3  
x3  Mango   0.2  
x3  Apple   0.1  
I want my final_df to contain all the rows contained in df_a such that it contemplates the unique combination of df_a['A'] == df_b['A'] and df_a['B'] == df_b['B_new'].
I've tried doing outer join and then drop duplicates w.r.t columns A and B in final_df but the value of B_new is not retained. 
Following is how I want my result_df to look like:
result_df
 A   B       C      D     E   B_new  F
x1  Apple   0.3   0.9    0.6  Apple  0.3
x1  Orange  0.1   0.5    0.2  Orange 0.1
x2  Apple   0.2   0.2    0.1  Apple   0.2 
x2  Orange  0.3   0.4    0.9  Orange  0.3
x2  Mango   0.1   0.2    0.3  Mango   0.1
x3  Orange  0.3   0.1    0.2  Orange  0.3
I also tried left outer join:
final_df = pd.merge(df_a, df_b, how="left", on=['A'])
The size of this dataframe is a union of df_a and df_b which is not what I want. 
Appreciate any suggestions.
Dataframes in Pandas can be merged using pandas. merge() method. Returns : A DataFrame of the two merged objects. While working on datasets there may be a need to merge two data frames with some complex conditions, below are some examples of merging two data frames with some complex conditions.
To merge two pandas DataFrames on multiple columns use pandas. merge() method. merge() is considered more versatile and flexible and we also have the same method in DataFrame.
Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame. to_csv can be used to write out DataFrames in CSV format.
You need an inner merge, specifying both merge columns in each case:
res = df_a.merge(df_b, how='inner', left_on=['A', 'B'], right_on=['A', 'B_new'])
print(res)
    A       B    C    D    E   B_new    F
0  x1   Apple  0.3  0.9  0.6   Apple  0.3
1  x1  Orange  0.1  0.5  0.2  Orange  0.1
2  x2   Apple  0.2  0.2  0.1   Apple  0.2
3  x2  Orange  0.3  0.4  0.9  Orange  0.3
4  x2   Mango  0.1  0.2  0.3   Mango  0.1
5  x3  Orange  0.3  0.1  0.2  Orange  0.3
                        You can still achieve this with a left join which is very ideal.
See below:
final_df = pd.merge(df_a, df_b[['A', 'B_new','F']], how="left", left_on=['A', 'B'], right_on=['A', 'B_new']);
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With