df_a
and df_b
are two dataframes that looks like following
df_a
A B C D E
x1 Apple 0.3 0.9 0.6
x1 Orange 0.1 0.5 0.2
x2 Apple 0.2 0.2 0.1
x2 Orange 0.3 0.4 0.9
x2 Mango 0.1 0.2 0.3
x3 Orange 0.3 0.1 0.2
df_b
A B_new F
x1 Apple 0.3
x1 Mango 0.2
x1 Orange 0.1
x2 Apple 0.2
x2 Orange 0.3
x2 Mango 0.1
x3 Orange 0.3
x3 Mango 0.2
x3 Apple 0.1
I want my final_df
to contain all the rows contained in df_a
such that it contemplates the unique combination of df_a['A'] == df_b['A']
and df_a['B'] == df_b['B_new']
.
I've tried doing outer join and then drop duplicates w.r.t columns A and B in final_df
but the value of B_new is not retained.
Following is how I want my result_df
to look like:
result_df
A B C D E B_new F
x1 Apple 0.3 0.9 0.6 Apple 0.3
x1 Orange 0.1 0.5 0.2 Orange 0.1
x2 Apple 0.2 0.2 0.1 Apple 0.2
x2 Orange 0.3 0.4 0.9 Orange 0.3
x2 Mango 0.1 0.2 0.3 Mango 0.1
x3 Orange 0.3 0.1 0.2 Orange 0.3
I also tried left outer join:
final_df = pd.merge(df_a, df_b, how="left", on=['A'])
The size of this dataframe is a union of df_a
and df_b
which is not what I want.
Appreciate any suggestions.
Dataframes in Pandas can be merged using pandas. merge() method. Returns : A DataFrame of the two merged objects. While working on datasets there may be a need to merge two data frames with some complex conditions, below are some examples of merging two data frames with some complex conditions.
To merge two pandas DataFrames on multiple columns use pandas. merge() method. merge() is considered more versatile and flexible and we also have the same method in DataFrame.
Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame. to_csv can be used to write out DataFrames in CSV format.
You need an inner merge, specifying both merge columns in each case:
res = df_a.merge(df_b, how='inner', left_on=['A', 'B'], right_on=['A', 'B_new'])
print(res)
A B C D E B_new F
0 x1 Apple 0.3 0.9 0.6 Apple 0.3
1 x1 Orange 0.1 0.5 0.2 Orange 0.1
2 x2 Apple 0.2 0.2 0.1 Apple 0.2
3 x2 Orange 0.3 0.4 0.9 Orange 0.3
4 x2 Mango 0.1 0.2 0.3 Mango 0.1
5 x3 Orange 0.3 0.1 0.2 Orange 0.3
You can still achieve this with a left join which is very ideal.
See below:
final_df = pd.merge(df_a, df_b[['A', 'B_new','F']], how="left", left_on=['A', 'B'], right_on=['A', 'B_new']);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With