Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare columns of Pandas dataframe for equality to produce True/False, even NaNs

I have two columns in a pandas dataframe that are supposed to be identical. Each column has many NaN values. I would like to compare the columns, producing a 3rd column containing True / False values; True when the columns match, False when they do not.

This is what I have tried:

df['new_column'] = (df['column_one'] == df['column_two'])

The above works for the numbers, but not the NaN values.

I know I could replace the NaNs with a value that doesn't make sense to be in each row (for my data this could be -9999), and then remove it later when I'm ready to echo out the comparison results, however I was wondering if there was a more pythonic method I was overlooking.

like image 293
traggatmot Avatar asked Sep 15 '16 02:09

traggatmot


People also ask

How do I compare two columns in a DataFrame pandas?

By using the Where() method in NumPy, we are given the condition to compare the columns. If 'column1' is lesser than 'column2' and 'column1' is lesser than the 'column3', We print the values of 'column1'. If the condition fails, we give the value as 'NaN'. These results are stored in the new column in the dataframe.

How do I compare 3 columns in pandas?

The new column called all_matching shows whether or not the values in all three columns match in a given row. For example: All three values match in the first row, so True is returned. Not every value matches in the second row, so False is returned.


1 Answers

Or you could just use the equals method:

df['new_column'] = df['column_one'].equals(df['column_two'])

It is a batteries included approach, and will work no matter the dtype or the content of the cells. You can also put it in a loop, if you want.

like image 122
Kartik Avatar answered Nov 06 '22 05:11

Kartik