Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Merge with NaN in key

I'm trying to merge 2 dataframes that both have NaN in their key column. NaN does not equal NaN, but yet the two NaNs in the "key" columns are matching. Why is that, and how can I get them not to match? I'm using python 3.6.

df1 = pd.DataFrame({'key': [3,2,1,1,np.nan,5], 'value': np.random.randn(6)})
df2 = pd.DataFrame({'key': [1,3,np.nan], 'value': np.random.randn(3)})

df = pd.merge(df1, df2, on='key', how='left')

print(df1)
print(df2)
print(df)

   key     value
0  3.0  0.642917
1  2.0  1.347245
2  1.0 -1.381299
3  1.0  1.839940
4  NaN  0.770599
5  5.0 -0.137404

   key     value
0  1.0  0.580794
1  3.0  0.569973
2  NaN -0.078336

   key   value_x   value_y
0  3.0  0.642917  0.569973
1  2.0  1.347245       NaN
2  1.0 -1.381299  0.580794
3  1.0  1.839940  0.580794
4  NaN  0.770599 -0.078336
5  5.0 -0.137404       NaN

np.nan == np.nan
Out[25]: False
like image 825
Rich C Avatar asked Mar 05 '19 01:03

Rich C


People also ask

Does pandas merge NaN?

To merge two Pandas DataFrame with common column, use the merge() function and set the ON parameter as the column name. To set NaN for unmatched values, use the “how” parameter and set it left or right. That would mean, merging left or right.

What does merge () do in pandas?

Pandas DataFrame merge() Method The merge() method updates the content of two DataFrame by merging them together, using the specified method(s). Use the parameters to control which values to keep and which to replace.

How do I merge two DataFrames with different columns in pandas?

It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.

What is left on and right on in pandas merge?

left_on − Columns from the left DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame. right_on − Columns from the right DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.


1 Answers

I once answered a question on the "why" part, you can read more at Why does pandas merge on NaN?.

To fix, why not just call dropna before merging?

df1.merge(df2.dropna(subset=['key']), on='key', how='left')

   key   value_x   value_y
0  3.0 -0.177450 -1.879047
1  2.0  0.179939       NaN
2  1.0 -1.033730 -1.433606
3  1.0  1.426648 -1.433606
4  NaN -0.320173       NaN
5  5.0 -1.824740       NaN
like image 96
cs95 Avatar answered Nov 04 '22 15:11

cs95