Pandas left merge keeping data in right dataframe on duplicte columns

Question

I would like to merge two dataframes, df2 might have more columns and will always be 1 row. I would like the data from the df2 row to overwrite the matching row in df. NOTE: ser and no columns together make a row unique.

import pandas as pd

df = pd.DataFrame({'ser': {0: 0, 1: 0, 2: 0, 3: 1, 4: 1, 5: 1, 6: 2, 7: 2, 8: 2}, 'no': {0: 0, 1: 1, 2: 2, 3: 0, 4: 1, 5: 2, 6: 0, 7: 1, 8: 2}, 'c': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1}})

df2 = pd.DataFrame({'ser': {0: 1}, 'no': {0: 2}, 'c': {0: 88}, 'd': {0: 90}})


>>> df
   ser  no  c
0    0   0  1
1    0   1  1
2    0   2  1
3    1   0  1
4    1   1  1
5    1   2  1
6    2   0  1
7    2   1  1
8    2   2  1

>>> df2
   ser  no   c   d
0    1   2  88  90

I have tried merge left but this creates two c columns (c_x and c_y):

>>> pd.merge(df,df2,how='left',on=['ser','no'])
   ser  no  c_x   c_y     d
0    0   0    1   NaN   NaN
1    0   1    1   NaN   NaN
2    0   2    1   NaN   NaN
3    1   0    1   NaN   NaN
4    1   1    1   NaN   NaN
5    1   2    1  88.0  90.0
6    2   0    1   NaN   NaN
7    2   1    1   NaN   NaN
8    2   2    1   NaN   NaN

DESIRED OUTPUT:

   ser  no    c   d  
0    0   0    1   NaN 
1    0   1    1   NaN  
2    0   2    1   NaN 
3    1   0    1   NaN 
4    1   1    1   NaN 
5    1   2    88  90 
6    2   0    1   NaN
7    2   1    1   NaN
8    2   2    1   NaN

piRSquared · Accepted Answer

Frankenstein Answer

df[['ser', 'no']].merge(df2, 'left').set_axis(df.index).fillna(df)

   ser  no     c     d
0    0   0   1.0   NaN
1    0   1   1.0   NaN
2    0   2   1.0   NaN
3    1   0   1.0   NaN
4    1   1   1.0   NaN
5    1   2  88.0  90.0
6    2   0   1.0   NaN
7    2   1   1.0   NaN
8    2   2   1.0   NaN

Explanation

I'm going to merge on the columns ['ser', 'no'] and don't want to specify in the merge call. Also, I don't want goofy duplicate column names like 'c_x' and 'c_y' so I slice only columns that I want in common then merge
```
 df[['ser', 'no']].merge(df2, 'left')
```
When I merge, I want only rows from the left dataframe. However, merge usually produces a number of rows vastly different from the original dataframes and therefore produces a new index. However, NOTE this is assuming the right dataframe (df2) has NO DUPLICATES with respect ['ser', 'no'] then a 'left' merge should produce the same exact number of rows as the left dataframe (df). But it won't have the same index necessarily. It turns out that in this example it does. But I don't want to take chances. So I use set_axis
```
  set_axis(df.index)
```
Finally, since the resulting dataframe has the same index and columns as df. I can fill in the missing bits with:
```
fillna(df)
```

Quang Hoang · Answer

Update: What you are looking for is combine_first:

(df2.set_index(['ser','no'])
    .combine_first(df.set_index(['ser','no']))
    .reset_index()
)

You can also try concat, which is more similar to 'outer' merge when the pair ser,no are unique valued.

pd.concat([df2,df]).groupby(['ser','no'], as_index=False).first()

Output:

   ser  no   c     d
0    0   0   1   NaN
1    0   1   1   NaN
2    0   2   1   NaN
3    1   0   1   NaN
4    1   1   1   NaN
5    1   2  88  90.0
6    2   0   1   NaN
7    2   1   1   NaN
8    2   2   1   NaN

Pandas left merge keeping data in right dataframe on duplicte columns

Tags:

python

pandas

dataframe

RMRiver

2 Answers

Frankenstein Answer

Explanation

piRSquared

Quang Hoang

Recent Activity

Donate For Us

Pandas left merge keeping data in right dataframe on duplicte columns

Tags:

python

pandas

dataframe

RMRiver

2 Answers

Frankenstein Answer

Explanation

piRSquared

Quang Hoang

Related questions

Recent Activity

Donate For Us