<code>df_a</code> and <code>df_b</code> are two dataframes that looks like following <pre class="prettyprint"><code>df_a A B C D E x1 Apple 0.3 0.9 0.6 x1 Orange 0.1 0.5 0.2 x2 Apple 0.2 0.2 0.1 x2 Orange 0.3 0.4 0.9 x2 Mango 0.1 0.2 0.3 x3 Orange 0.3 0.1 0.2 df_b A B_new F x1 Apple 0.3 x1 Mango 0.2 x1 Orange 0.1 x2 Apple 0.2 x2 Orange 0.3 x2 Mango 0.1 x3 Orange 0.3 x3 Mango 0.2 x3 Apple 0.1 </code></pre> I want my <code>final_df</code> to contain all the rows contained in <code>df_a</code> such that it contemplates the unique combination of <code>df_a['A'] == df_b['A']</code> and <code>df_a['B'] == df_b['B_new']</code>. I've tried doing outer join and then drop duplicates w.r.t columns A and B in <code>final_df</code> but the value of B_new is not retained. Following is how I want my <code>result_df</code> to look like: result_df <pre class="prettyprint"><code> A B C D E B_new F x1 Apple 0.3 0.9 0.6 Apple 0.3 x1 Orange 0.1 0.5 0.2 Orange 0.1 x2 Apple 0.2 0.2 0.1 Apple 0.2 x2 Orange 0.3 0.4 0.9 Orange 0.3 x2 Mango 0.1 0.2 0.3 Mango 0.1 x3 Orange 0.3 0.1 0.2 Orange 0.3 </code></pre> I also tried left outer join: <pre class="prettyprint"><code>final_df = pd.merge(df_a, df_b, how="left", on=['A']) </code></pre> The size of this dataframe is a union of <code>df_a</code> and <code>df_b</code> which is not what I want. Appreciate any suggestions.

You can still achieve this with a left join which is very ideal. See below: <pre class="prettyprint"><code>final_df = pd.merge(df_a, df_b[['A', 'B_new','F']], how="left", left_on=['A', 'B'], right_on=['A', 'B_new']); </code></pre>

Joining two pandas dataframes based on multiple conditions

df_a and df_b are two dataframes that looks like following

df_a
A   B       C      D     E
x1  Apple   0.3   0.9    0.6
x1  Orange  0.1   0.5    0.2
x2  Apple   0.2   0.2    0.1
x2  Orange  0.3   0.4    0.9
x2  Mango   0.1   0.2    0.3
x3  Orange  0.3   0.1    0.2


df_b
A   B_new   F    
x1  Apple   0.3  
x1  Mango   0.2  
x1  Orange  0.1   
x2  Apple   0.2   
x2  Orange  0.3     
x2  Mango   0.1  
x3  Orange  0.3  
x3  Mango   0.2  
x3  Apple   0.1

I want my final_df to contain all the rows contained in df_a such that it contemplates the unique combination of df_a['A'] == df_b['A'] and df_a['B'] == df_b['B_new'].

I've tried doing outer join and then drop duplicates w.r.t columns A and B in final_df but the value of B_new is not retained.

Following is how I want my result_df to look like:

result_df

 A   B       C      D     E   B_new  F
x1  Apple   0.3   0.9    0.6  Apple  0.3
x1  Orange  0.1   0.5    0.2  Orange 0.1
x2  Apple   0.2   0.2    0.1  Apple   0.2 
x2  Orange  0.3   0.4    0.9  Orange  0.3
x2  Mango   0.1   0.2    0.3  Mango   0.1
x3  Orange  0.3   0.1    0.2  Orange  0.3

I also tried left outer join:

final_df = pd.merge(df_a, df_b, how="left", on=['A'])

The size of this dataframe is a union of df_a and df_b which is not what I want.

Appreciate any suggestions.

How do I join two DataFrames in pandas based on a condition?

Dataframes in Pandas can be merged using pandas. merge() method. Returns : A DataFrame of the two merged objects. While working on datasets there may be a need to merge two data frames with some complex conditions, below are some examples of merging two data frames with some complex conditions.

How do I join two DataFrames in pandas based on multiple columns?

To merge two pandas DataFrames on multiple columns use pandas. merge() method. merge() is considered more versatile and flexible and we also have the same method in DataFrame.

Which are the 3 main ways of combining DataFrames together?

Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame. to_csv can be used to write out DataFrames in CSV format.

You need an inner merge, specifying both merge columns in each case:

res = df_a.merge(df_b, how='inner', left_on=['A', 'B'], right_on=['A', 'B_new'])

print(res)

    A       B    C    D    E   B_new    F
0  x1   Apple  0.3  0.9  0.6   Apple  0.3
1  x1  Orange  0.1  0.5  0.2  Orange  0.1
2  x2   Apple  0.2  0.2  0.1   Apple  0.2
3  x2  Orange  0.3  0.4  0.9  Orange  0.3
4  x2   Mango  0.1  0.2  0.3   Mango  0.1
5  x3  Orange  0.3  0.1  0.2  Orange  0.3

You can still achieve this with a left join which is very ideal.
See below:

final_df = pd.merge(df_a, df_b[['A', 'B_new','F']], how="left", left_on=['A', 'B'], right_on=['A', 'B_new']);

Joining two pandas dataframes based on multiple conditions

Tags:

python

merge

pandas

dataframe

iprof0214

People also ask

2 Answers

jpp

Daniel

Recent Activity

Donate For Us

Joining two pandas dataframes based on multiple conditions

Tags:

python

merge

pandas

dataframe

iprof0214

People also ask

2 Answers

jpp

Daniel

Related questions

Recent Activity

Donate For Us