Pandas merge dataframes with shared column, fillna in left with right

Tags:

I am trying to merge two dataframes and replace the nan in the left df with the right df, I can do it with three lines of code as below, but I want to know if there is a better/shorter way?

# Example data (my actual df is ~500k rows x 11 cols)
df1 = pd.DataFrame({'a': [1,2,3,4], 'b': [0,1,np.nan, 1], 'e': ['a', 1, 2,'b']})
df2 = pd.DataFrame({'a': [1,2,3,4], 'b': [np.nan, 1, 0, 1]})

# Merge the dataframes...
df = df1.merge(df2, on='a', how='left')

# Fillna in 'b' column of left df with right df...
df['b'] = df['b_x'].fillna(df['b_y'])

# Drop the columns no longer needed
df = df.drop(['b_x', 'b_y'], axis=1)

327

asked Jul 01 '19 20:07

Kenan

Video Answer

2 Answers

The problem confusing merge is that both dataframes have a 'b' column, but the left and right versions have NaNs in mismatched places. You want to avoid getting unwanted multiple 'b' columns 'b_x', 'b_y' from merge in the first place:

slice the non-shared columns 'a','e' from df1
do merge(df2, 'left'), this will pick up 'b' from the right dataframe (since it only exists in the right df)
finally do df1.update(...) , this will update the NaNs in the column 'b' taken from df2 with df1['b']

Solution:

df1.update(df1[['a', 'e']].merge(df2, 'left'))

df1

   a    b  e
0  1  0.0  a
1  2  1.0  1
2  3  0.0  2
3  4  1.0  b

Note: Because I used merge(..., how='left'), I preserve the row order of the calling dataframe. If my df1 had values of a that were not in order

   a    b  e
0  1  0.0  a
1  2  1.0  1
2  4  1.0  b
3  3  NaN  2

The result would be

df1.update(df1[['a', 'e']].merge(df2, 'left'))

df1

   a    b  e
0  1  0.0  a
1  2  1.0  1
2  4  1.0  b
3  3  0.0  2

Which is as expected.

Further...

If you want to be more explicit when there may be more columns involved

df1.update(df1.drop('b', 1).merge(df2, 'left', 'a'))

Even Further...

If you don't want to update the dataframe, we can use combine_first

Quick

df1.combine_first(df1[['a', 'e']].merge(df2, 'left'))

Explicit

df1.combine_first(df1.drop('b', 1).merge(df2, 'left', 'a'))

EVEN FURTHER!...

The 'left' merge may preserve order but NOT the index. This is the ultra conservative approach:

df3 = df1.drop('b', 1).merge(df2, 'left', on='a').set_index(df1.index)
df1.combine_first(df3)

answered Sep 18 '22 14:09

piRSquared

Short version

df1.b.fillna(df1.a.map(df2.set_index('a').b),inplace=True)
df1
Out[173]: 
   a    b  e
0  1  0.0  a
1  2  1.0  1
2  3  0.0  2
3  4  1.0  b

Since you mentioned there will be multiple columns

df = df1.combine_first(df1[['a']].merge(df2, on='a', how='left'))
df
Out[184]: 
   a    b  e
0  1  0.0  a
1  2  1.0  1
2  3  0.0  2
3  4  1.0  b

Also we can pass to fillna with df

df1.fillna(df1[['a']].merge(df2, on='a', how='left'))
Out[185]: 
   a    b  e
0  1  0.0  a
1  2  1.0  1
2  3  0.0  2
3  4  1.0  b

answered Sep 19 '22 14:09

BENY

Related questions
                            
                                numpy.core.multiarray failed to import
                            
                                Time Series Analysis - unevenly spaced measures - pandas + statsmodels
                            
                                When bulding a CNN, I am getting complaints from Keras that do not make sense to me.
                            
                                pandas read_csv column dtype is set to decimal but converts to string
                            
                                Split nested array values from Pandas Dataframe cell over multiple rows
                            
                                Pandas: get multiindex level as series
                            
                                Using tf.unpack() when first dimension of Variable is None
                            
                                Exclude unwanted tag on Beautifulsoup Python
                            
                                How to use paho mqtt client in django?
                            
                                What does `layer.get_weights()` return?
                            
                                Flier colors in boxplot with matplotlib
                            
                                python pandas sum by hour of day
                            
                                Copying MultiIndex dataframes with pd.read_clipboard?
                            
                                Django custom for complex Func (sql function)
                            
                                How to merge/combine columns in pandas?
                            
                                Create a pivot table that lists out values
                            
                                Install Pyicu in python 3.x
                            
                                How to dynamically add EC2 ip addresses to Django ALLOWED_HOSTS
                            
                                How to use additional features along with word embeddings in Keras ?
                            
                                Iterate over pandas series

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas merge dataframes with shared column, fillna in left with right

Tags:

python

merge

pandas

dataframe