I would like to merge two dataframes, df2 might have more columns and will always be 1 row. I would like the data from the df2 row to overwrite the matching row in df. NOTE: ser
and no
columns together make a row unique.
import pandas as pd
df = pd.DataFrame({'ser': {0: 0, 1: 0, 2: 0, 3: 1, 4: 1, 5: 1, 6: 2, 7: 2, 8: 2}, 'no': {0: 0, 1: 1, 2: 2, 3: 0, 4: 1, 5: 2, 6: 0, 7: 1, 8: 2}, 'c': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1}})
df2 = pd.DataFrame({'ser': {0: 1}, 'no': {0: 2}, 'c': {0: 88}, 'd': {0: 90}})
>>> df
ser no c
0 0 0 1
1 0 1 1
2 0 2 1
3 1 0 1
4 1 1 1
5 1 2 1
6 2 0 1
7 2 1 1
8 2 2 1
>>> df2
ser no c d
0 1 2 88 90
I have tried merge left but this creates two c columns (c_x and c_y):
>>> pd.merge(df,df2,how='left',on=['ser','no'])
ser no c_x c_y d
0 0 0 1 NaN NaN
1 0 1 1 NaN NaN
2 0 2 1 NaN NaN
3 1 0 1 NaN NaN
4 1 1 1 NaN NaN
5 1 2 1 88.0 90.0
6 2 0 1 NaN NaN
7 2 1 1 NaN NaN
8 2 2 1 NaN NaN
DESIRED OUTPUT:
ser no c d
0 0 0 1 NaN
1 0 1 1 NaN
2 0 2 1 NaN
3 1 0 1 NaN
4 1 1 1 NaN
5 1 2 88 90
6 2 0 1 NaN
7 2 1 1 NaN
8 2 2 1 NaN
df[['ser', 'no']].merge(df2, 'left').set_axis(df.index).fillna(df)
ser no c d
0 0 0 1.0 NaN
1 0 1 1.0 NaN
2 0 2 1.0 NaN
3 1 0 1.0 NaN
4 1 1 1.0 NaN
5 1 2 88.0 90.0
6 2 0 1.0 NaN
7 2 1 1.0 NaN
8 2 2 1.0 NaN
I'm going to merge on the columns ['ser', 'no']
and don't want to specify in the merge
call. Also, I don't want goofy duplicate column names like 'c_x'
and 'c_y'
so I slice only columns that I want in common then merge
df[['ser', 'no']].merge(df2, 'left')
When I merge, I want only rows from the left dataframe. However, merge
usually produces a number of rows vastly different from the original dataframes and therefore produces a new index
. However, NOTE this is assuming the right dataframe (df2
) has NO DUPLICATES with respect ['ser', 'no']
then a 'left'
merge
should produce the same exact number of rows as the left dataframe (df
). But it won't have the same index
necessarily. It turns out that in this example it does. But I don't want to take chances. So I use set_axis
set_axis(df.index)
Finally, since the resulting dataframe has the same index
and columns
as df
. I can fill in the missing bits with:
fillna(df)
Update: What you are looking for is combine_first
:
(df2.set_index(['ser','no'])
.combine_first(df.set_index(['ser','no']))
.reset_index()
)
You can also try concat
, which is more similar to 'outer'
merge when the pair ser,no
are unique valued.
pd.concat([df2,df]).groupby(['ser','no'], as_index=False).first()
Output:
ser no c d
0 0 0 1 NaN
1 0 1 1 NaN
2 0 2 1 NaN
3 1 0 1 NaN
4 1 1 1 NaN
5 1 2 88 90.0
6 2 0 1 NaN
7 2 1 1 NaN
8 2 2 1 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With