How to assert that the following two dataframes df1
and df2
are equal?
import pandas as pd
df1 = pd.DataFrame([1, 2, 3])
df2 = pd.DataFrame([1.0, 2, 3])
The output of df1.equals(df2)
is False
.
As of now, I know two ways:
print (df1 == df2).all()[0]
or
df1 = df1.astype(float)
print df1.equals(df2)
It seems a little bit messy. Is there a better way to do this comparison?
Pandas DataFrame: equals() function The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.
Step 1: Define two Pandas series, s1 and s2. Step 2: Compare the series using compare() function in the Pandas series. Step 3: Print their difference.
equals() function determine if two Index objects contains the same elements. If they contain the same elements then the function returns True else the function returns False indicating the values contained in both the Indexes are different.
You can use assert_frame_equal
and not check the dtype of the columns.
# Pre v. 0.20.3 # from pandas.util.testing import assert_frame_equal from pandas.testing import assert_frame_equal assert_frame_equal(df1, df2, check_dtype=False)
Using elegant @Divakar's idea - numpy's allclose() will do the main trick for numbers:
In [128]: df1
Out[128]:
0 s n
0 1 aaa 1
1 2 aaa 2
2 3 aaa 3
In [129]: df2
Out[129]:
0 s n
0 1.0 aaa 1.0
1 2.0 aaa 2.0
2 3.0 aaa 3.0
In [130]: (np.allclose(df1.select_dtypes(exclude=[object]), df2.select_dtypes(exclude=[object]))
.....: &
.....: df1.select_dtypes(include=[object]).equals(df2.select_dtypes(include=[object]))
.....: )
Out[130]: True
select_dtypes() will help you to separate strings and all other numeric dtypes
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With