Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas left merge keeping data in right dataframe on duplicte columns

I would like to merge two dataframes, df2 might have more columns and will always be 1 row. I would like the data from the df2 row to overwrite the matching row in df. NOTE: ser and no columns together make a row unique.

import pandas as pd

df = pd.DataFrame({'ser': {0: 0, 1: 0, 2: 0, 3: 1, 4: 1, 5: 1, 6: 2, 7: 2, 8: 2}, 'no': {0: 0, 1: 1, 2: 2, 3: 0, 4: 1, 5: 2, 6: 0, 7: 1, 8: 2}, 'c': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1}})

df2 = pd.DataFrame({'ser': {0: 1}, 'no': {0: 2}, 'c': {0: 88}, 'd': {0: 90}})


>>> df
   ser  no  c
0    0   0  1
1    0   1  1
2    0   2  1
3    1   0  1
4    1   1  1
5    1   2  1
6    2   0  1
7    2   1  1
8    2   2  1

>>> df2
   ser  no   c   d
0    1   2  88  90

I have tried merge left but this creates two c columns (c_x and c_y):

>>> pd.merge(df,df2,how='left',on=['ser','no'])
   ser  no  c_x   c_y     d
0    0   0    1   NaN   NaN
1    0   1    1   NaN   NaN
2    0   2    1   NaN   NaN
3    1   0    1   NaN   NaN
4    1   1    1   NaN   NaN
5    1   2    1  88.0  90.0
6    2   0    1   NaN   NaN
7    2   1    1   NaN   NaN
8    2   2    1   NaN   NaN

DESIRED OUTPUT:

   ser  no    c   d  
0    0   0    1   NaN 
1    0   1    1   NaN  
2    0   2    1   NaN 
3    1   0    1   NaN 
4    1   1    1   NaN 
5    1   2    88  90 
6    2   0    1   NaN
7    2   1    1   NaN
8    2   2    1   NaN
like image 840
RMRiver Avatar asked Jan 25 '23 10:01

RMRiver


2 Answers

Frankenstein Answer

df[['ser', 'no']].merge(df2, 'left').set_axis(df.index).fillna(df)

   ser  no     c     d
0    0   0   1.0   NaN
1    0   1   1.0   NaN
2    0   2   1.0   NaN
3    1   0   1.0   NaN
4    1   1   1.0   NaN
5    1   2  88.0  90.0
6    2   0   1.0   NaN
7    2   1   1.0   NaN
8    2   2   1.0   NaN

Explanation

  1. I'm going to merge on the columns ['ser', 'no'] and don't want to specify in the merge call. Also, I don't want goofy duplicate column names like 'c_x' and 'c_y' so I slice only columns that I want in common then merge

     df[['ser', 'no']].merge(df2, 'left')
    
  2. When I merge, I want only rows from the left dataframe. However, merge usually produces a number of rows vastly different from the original dataframes and therefore produces a new index. However, NOTE this is assuming the right dataframe (df2) has NO DUPLICATES with respect ['ser', 'no'] then a 'left' merge should produce the same exact number of rows as the left dataframe (df). But it won't have the same index necessarily. It turns out that in this example it does. But I don't want to take chances. So I use set_axis

      set_axis(df.index)
    
  3. Finally, since the resulting dataframe has the same index and columns as df. I can fill in the missing bits with:

    fillna(df)
    
like image 89
piRSquared Avatar answered Jan 27 '23 03:01

piRSquared


Update: What you are looking for is combine_first:

(df2.set_index(['ser','no'])
    .combine_first(df.set_index(['ser','no']))
    .reset_index()
)

You can also try concat, which is more similar to 'outer' merge when the pair ser,no are unique valued.

pd.concat([df2,df]).groupby(['ser','no'], as_index=False).first()

Output:

   ser  no   c     d
0    0   0   1   NaN
1    0   1   1   NaN
2    0   2   1   NaN
3    1   0   1   NaN
4    1   1   1   NaN
5    1   2  88  90.0
6    2   0   1   NaN
7    2   1   1   NaN
8    2   2   1   NaN
like image 27
Quang Hoang Avatar answered Jan 27 '23 04:01

Quang Hoang