Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing two pandas dataframes with different integer types

Tags:

python

pandas

I just ran into some weird behaviour comparing the values of two pandas dataframes using pd.Dataframe.equals():

Comparison 1

df1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
df2 = df1.copy()

df1.equals(df2)
# True (obviously)

However, when I change the column type to a different integer format, they will not be considered equal anymore:

df1['a'] = df1['a'].astype(np.int32)
df1.equals(df2)
# False

In the .equals() documentation, they point out that the variables must have the same type, and present an example comparing floats to integers, which doesn't work. I didn't expect this to extend to different types of integers, too.

Comparison 2

When doing the same comparison using ==, it does return True:

(df1 == df2).all().all()   
# True

However, == doesn't assess two missing values as equal to each other.

My question

Is there an elegant way to handle missing values as equal, whilst not enforcing the same integer type? The best I can come up with is:

(df1.fillna(0) == df2.fillna(0)).all().all()

but there has to be a more concise and less hacky way to deal with this problem.

My follow up, opinion-based question: Would you consider this a bug?

like image 488
KenHBS Avatar asked Feb 26 '20 11:02

KenHBS


People also ask

How do I compare data types in Pandas?

To check the data type in pandas DataFrame we can use the “dtype” attribute. The attribute returns a series with the data type of each column. And the column names of the DataFrame are represented as the index of the resultant series object and the corresponding data types are returned as values of the series object.

How do you compare the elements of two Pandas?

Compare two Series objects of the same length and return a Series where each element is True if the element in each Series is equal, False otherwise. Compare two DataFrame objects of the same shape and return a DataFrame where each element is True if the respective element in each DataFrame is equal, False otherwise.

How to compare two DataFrames in pandas?

In this article, we will discuss how to compare two DataFrames in pandas. First, let’s create two DataFrames. By using equals () function we can directly check if df1 is equal to df2. This function is used to determine if two dataframe objects in consideration are equal or not.

How to find the difference between assists and points in pandas?

We can find the differences between the assists and points for each player by using the pandas subtract () function: Player A had the same amount of points in both DataFrames, but they had 3 more assists in DataFrame 2. Player B had 9 more points and 2 more assists in DataFrame 2 compared to DataFrame 1.

How to find out if the two DataFrames are identical?

Example 1: Find out if the two DataFrames are identical. We can first find out if the two DataFrames are identical by using the DataFrame.equals () function:

What is the difference between merge () and concat () functions in pandas?

Merge function is similar to SQL inner join, we find the common rows between two dataframes. The concat () function does all the heavy lifting of performing concatenation operations along with an axis od Pandas objects while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.


Video Answer


1 Answers

If you think of this as a decimal problem (i.e. does 2 equal 2) then this perhaps looks like a bug. However, if you look at it from how the interpreter sees it (i.e. does 00000010 equal 0000000000000010) then it becomes plain that there is indeed a difference. Bitwise operations.

From a validation perspective, it is probably a good idea to make sure you are comparing apples to apples and so I like the answer of @Ben.T:

df1.equals(df2.astype(df1.dtypes))

Is this a bug? That is above my pay grade. You can submit it, and the thinkers surrounding the pandas library can make a decision. It does seem odd that the '==' operator gives different results that the '.equals' function and that may sway the decision.

like image 113
Sid Kwakkel Avatar answered Oct 11 '22 11:10

Sid Kwakkel