Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare two columns with NaNs in Pandas and get differences

I have a following dataframe:

case c1   c2
1    x    x
2    NaN  y
3    x    NaN
4    y    x
5    NaN  NaN 

I would like to get a column "match" which will show which records with values in "c1" and "c2" are equal or different:

case c1   c2   match
1    x    x    True  
2    NaN  y    False
3    x    NaN  False
4    y    x    False
5    NaN  NaN  True 

I tried the following based on another Stack Overflow question: Comparing two columns and keeping NaNs However, I can't get both cases 4 and 5 correct.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'case': [1, 2, 3, 4, 5],
    'c1': ['x', np.nan,'x','y', np.nan],
    'c2': ['x', 'y',np.nan,'x', np.nan],
})

cond1 = df['c1'] == df['c2']
cond2 = (df['c1'].isnull()) == (df['c2'].isnull())

df['c3'] = np.select([cond1, cond2], [True, True], False)

df
like image 946
verkter Avatar asked Oct 28 '25 10:10

verkter


1 Answers

Use eq with isna:

df.c1.eq(df.c2)|df.iloc[:, 1:].isna().all(1)
#or
df.c1.eq(df.c2)|df.loc[:, ['c1','c2']].isna().all(1)
like image 93
Space Impact Avatar answered Oct 30 '25 01:10

Space Impact



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!