Comparing two pandas dataframes with different integer types

Comparison 1

df1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
df2 = df1.copy()

df1.equals(df2)
# True (obviously)

However, when I change the column type to a different integer format, they will not be considered equal anymore:

df1['a'] = df1['a'].astype(np.int32)
df1.equals(df2)
# False

In the .equals() documentation, they point out that the variables must have the same type, and present an example comparing floats to integers, which doesn't work. I didn't expect this to extend to different types of integers, too.

Comparison 2

When doing the same comparison using ==, it does return True:

(df1 == df2).all().all()   
# True

However, == doesn't assess two missing values as equal to each other.

My question

Is there an elegant way to handle missing values as equal, whilst not enforcing the same integer type? The best I can come up with is:

(df1.fillna(0) == df2.fillna(0)).all().all()

but there has to be a more concise and less hacky way to deal with this problem.

My follow up, opinion-based question: Would you consider this a bug?

488

asked Feb 26 '20 11:02

KenHBS

Video Answer

1 Answers

If you think of this as a decimal problem (i.e. does 2 equal 2) then this perhaps looks like a bug. However, if you look at it from how the interpreter sees it (i.e. does 00000010 equal 0000000000000010) then it becomes plain that there is indeed a difference. Bitwise operations.

From a validation perspective, it is probably a good idea to make sure you are comparing apples to apples and so I like the answer of @Ben.T:

df1.equals(df2.astype(df1.dtypes))

Is this a bug? That is above my pay grade. You can submit it, and the thinkers surrounding the pandas library can make a decision. It does seem odd that the '==' operator gives different results that the '.equals' function and that may sway the decision.

113

answered Oct 11 '22 11:10

Sid Kwakkel

Related questions
                            
                                Which cubic spline method does scipy ndimage use for affine_transform?
                            
                                Meson cannot find pykeepass module, I am certain it is installed
                            
                                pandas explode fails with KeyError: 0
                            
                                How to detect empty park space using morphologyEx and drawContours?
                            
                                SQLAlchemy Nested CTE Query
                            
                                Flask Dependency injection
                            
                                Installing Faiss on Lambda Stack
                            
                                Current node to next node feature combinations in decision tree learning: useful to determine potential interactions?
                            
                                Attribute Qt::AA_UseSoftwareOpenGL must be set before QCoreApplication is created
                            
                                How to convert tf.contrib to Tensorflow 2.0
                            
                                How can I format tqdm progress bar to show progress per minute instead of per second?
                            
                                Pandas apply convolve by group of rows
                            
                                Getting Model Explanations with Tensorflow Serving and SavedModel Estimators
                            
                                Adding the line of identity to a scatter plot using altair
                            
                                Gst-python is installed, but can't find plugins
                            
                                Error - b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2'
                            
                                How do I include a built-in django widget template in my custom widget template?
                            
                                Python Google Images download does not work
                            
                                How to set column width to bestFit in openpyxl
                            
                                Inputting an obscure file type into tensorflow

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Comparing two pandas dataframes with different integer types

Tags:

python

pandas