I have two columns in a pandas dataframe that are supposed to be identical. Each column has many NaN values. I would like to compare the columns, producing a 3rd column containing True / False values; True when the columns match, False when they do not.
This is what I have tried:
df['new_column'] = (df['column_one'] == df['column_two'])
The above works for the numbers, but not the NaN values.
I know I could replace the NaNs with a value that doesn't make sense to be in each row (for my data this could be -9999), and then remove it later when I'm ready to echo out the comparison results, however I was wondering if there was a more pythonic method I was overlooking.
By using the Where() method in NumPy, we are given the condition to compare the columns. If 'column1' is lesser than 'column2' and 'column1' is lesser than the 'column3', We print the values of 'column1'. If the condition fails, we give the value as 'NaN'. These results are stored in the new column in the dataframe.
The new column called all_matching shows whether or not the values in all three columns match in a given row. For example: All three values match in the first row, so True is returned. Not every value matches in the second row, so False is returned.
Or you could just use the equals
method:
df['new_column'] = df['column_one'].equals(df['column_two'])
It is a batteries included approach, and will work no matter the dtype
or the content of the cells. You can also put it in a loop, if you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With