Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to output all differences if `pandas.testing.assert_frame_equal` fails?

I am unittesting a Dataframe output. I have two dataframes with differing values on multiple columns

df1 = pd.DataFrame({"col1": [1, 1], "col2":[1, 1]})
df2 = pd.DataFrame({"col1": [1, 2], "col2":[1, 2]})

When I run pandas.testing.assert_frame_equal, I get the following error, with only one column:

DataFrame.iloc[:, 0] (column name="col1") values are different (50.0 %)
[index]: [0, 1]
[left]:  [1, 1]
[right]: [1, 2]

However, I have no info about the second column. Is there a way of showing all the mismatchs, and not just the first from the most left-side column?

like image 335
ledermauss Avatar asked Oct 12 '25 18:10

ledermauss


1 Answers

Another (hacky, but slightly more performant) way to do this:

def assert_frame_equal_extended_diff(df1, df2):
    try:
        pd.testing.assert_frame_equal(df1, df2)

    except AssertionError as e:
        # if this was a shape or index/col error, then re-raise
        try:
            pd.testing.assert_index_equal(df1.index, df2.index)
            pd.testing.assert_index_equal(df1.columns, df2.columns)
        except AssertionError:
            raise e

        # if not, we have a value error 
        diff = df1 != df2
        diffcols = diff.any(axis=0)
        diffrows = diff.any(axis=1)
        cmp = pd.concat(
            {'left': df1.loc[diffrows, diffcols], 'right': df2.loc[diffrows, diffcols]},
            names=['dataframe'],
            axis=1,
        )

        raise AssertionError(e.args[0] + f'\n\nDifferences:\n{cmp}') from None

This will use pandas.DataFrame's repr to display just the differences:

In [5]: df1 = pd.DataFrame({
   ...:     'samecol': np.arange(1500),
   ...:     'diffcol': np.arange(1500),
   ...:     'anothercol': np.ones(shape=1500),
   ...: })

In [6]: df2 = df1.copy()
   ...: df2.iloc[1000:1014, 1] = range(14)

In [7]: assert_frame_equal_extended_diff(df1, df2)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 assert_frame_equal_extended_diff(df1, df2)

Input In [6], in assert_frame_equal_extended_diff(df1, df2)
     11 diffrows = diff.any(axis=1)
     12 cmp = pd.concat(
     13     {'left': df1.loc[diffrows, diffcols], 'right': df2.loc[diffrows, diffcols]},
     14     names=['dataframe'],
     15     axis=1,
     16 )
---> 18 raise AssertionError(e.args[0] + f'\n\nDifferences:\n{cmp}') from None

AssertionError: DataFrame.iloc[:, 1] (column name="diffcol") are different

DataFrame.iloc[:, 1] (column name="diffcol") values are different (0.93333 %)
[index]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...]
[left]:  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...]
[right]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...]

Differences:
dataframe    left   right
          diffcol diffcol
1000         1000       0
1001         1001       1
1002         1002       2
1003         1003       3
1004         1004       4
1005         1005       5
1006         1006       6
1007         1007       7
1008         1008       8
1009         1009       9
1010         1010      10
1011         1011      11
1012         1012      12
1013         1013      13

note - this answer is intended to be helpful for debugging but isn't a comprehensive/edge-case-free approach. edits welcome, but use at your own risk.


Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!