Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does a successful assertEqual not always imply a successful assertItemsEqual?

The Python 2.7 docs state that assertItemsEqual "is the equivalent of assertEqual(sorted(expected), sorted(actual))". In the below example, all tests pass except for test4. Why does assertItemsEqual fail in this case?

Per the principle of least astonishment, given two iterables, I would expect that a successful assertEqual implies a successful assertItemsEqual.

import unittest

class foo(object):
    def __init__(self, a):
        self.a = a

    def __eq__(self, other):
        return self.a == other.a

class test(unittest.TestCase):
    def setUp(self):
        self.list1 = [foo(1), foo(2)]
        self.list2 = [foo(1), foo(2)]

    def test1(self):
        self.assertTrue(self.list1 == self.list2)

    def test2(self):
        self.assertEqual(self.list1, self.list2)

    def test3(self):
        self.assertEqual(sorted(self.list1), sorted(self.list2))

    def test4(self):
        self.assertItemsEqual(self.list1, self.list2)

if __name__=='__main__':
    unittest.main()

Here is the output on my machine:

FAIL: test4 (__main__.test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "assert_test.py", line 25, in test4
    self.assertItemsEqual(self.list1, self.list2)
AssertionError: Element counts were not equal:
First has 1, Second has 0:  <__main__.foo object at 0x7f67b3ce2590>
First has 1, Second has 0:  <__main__.foo object at 0x7f67b3ce25d0>
First has 0, Second has 1:  <__main__.foo object at 0x7f67b3ce2610>
First has 0, Second has 1:  <__main__.foo object at 0x7f67b3ce2650>

----------------------------------------------------------------------
Ran 4 tests in 0.001s

FAILED (failures=1)
like image 949
Matthew Nizol Avatar asked Apr 17 '15 03:04

Matthew Nizol


1 Answers

The document spec is interestingly detached from the implementation, which never does any sorting. Here is the source code. As you can see, it first tries to count by hashing using collections.Counter. If this fails with a type error (because either list contains an item that's unhashable), it moves on to a second algorithm, where it compares using python == and O(n^2) loops.

So if your foo class were unhashable, the second algorithm would signal a match. But it is perfectly hashable. From the docs:

Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is derived from their id().

I verified this by calling collections.Counter([foo(1)]). No type error exception.

So here is where your code comes off the rails. From the docs on __hash__:

if it defines cmp() or eq() but not hash(), its instances will not be usable in hashed collections.

Unfortunately "not usable" apparently does not equate to "unhashable."

It goes on to say:

Classes which inherit a hash() method from a parent class but change the meaning of cmp() or eq() such that the hash value returned is no longer appropriate (e.g. by switching to a value-based concept of equality instead of the default identity based equality) can explicitly flag themselves as being unhashable by setting hash = None in the class definition.

If we redefine:

class foo(object):
    __hash__ = None
    def __init__(self, a):
        self.a = a
    def __eq__(self, other):
        return isinstance(other, foo) and self.a == other.a

all tests pass!

So it appears the documents are not exactly wrong, but they're not abundantly clear either. They ought to mention that counting is done with hashing and only if that fails is simple equality matching tried. This is only a valid approach if the objects have either complete hashing semantics or are completely unhashable. Yours were in the middle ground. (I believe Python 3 is more rigorous about disallowing or at least warning against classes of this type.)

like image 166
Gene Avatar answered Nov 15 '22 05:11

Gene