Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unittest equality of empty record arrays

I noticed the following unittest.TestCase assertion failing and am wondering how to correctly compare empty recarrays:

fails:

self.assertEqual(
    np.array(
        [],
        dtype=[
            ('time', 'datetime64[ns]'),
            ('end_time', int)
        ]
    ).view(np.recarray),
    np.array(
        [],
        dtype=[
            ('time', 'datetime64[ns]'),
            ('end_time', int)
        ]
    ).view(np.recarray)
)

passes:

self.assertEqual(
    np.array(
        [(1,1)],
        dtype=[
            ('time', 'datetime64[ns]'),
            ('end_time', int)
        ]
    ).view(np.recarray),
    np.array(
        [(1,1)],
        dtype=[
            ('time', 'datetime64[ns]'),
            ('end_time', int)
        ]
    ).view(np.recarray)
)

Is this a bug or am I doing something wrong here?

like image 579
Adverbly Avatar asked Jan 02 '23 22:01

Adverbly


2 Answers

I can only assume unittest.TestCase.assertEqual uses the __eq__ method, which in numpy.ndarray objects, does elementwise equality. Thus, using == on two empty arrays returns an empty boolean array, which is falsy:

>>> arr1
rec.array([],
          dtype=[('time', '<M8[ns]'), ('end_time', '<i8')])
>>> arr2
rec.array([],
          dtype=[('time', '<M8[ns]'), ('end_time', '<i8')])
>>> bool(arr1 == arr2)
False

Now, in your second case, you are dealing with another special case, that is, an array of shape (1,), which is the result of elementwise equality on two record-arrays with a single element. Essentially, in the case of an array with a single item, the truthiness is whatever the truthiness of the element is:

>>> bool(np.array([1]))
True
>>> bool(np.array([0]))
False
>>> bool(np.array([{}]))
False
>>> bool(np.array([{'a':1}]))
True
>>> bool(np.array([object()]))
True

So, with your arrays:

>>> arr3 = np.array(
...         [(1,1)],
...         dtype=[
...             ('time', 'datetime64[ns]'),
...             ('end_time', int)
...         ]
...     ).view(np.recarray)
>>> arr4 = np.array(
...         [(1,1)],
...         dtype=[
...             ('time', 'datetime64[ns]'),
...             ('end_time', int)
...         ]
...     ).view(np.recarray)
>>> arr3.size, arr4.size
(1, 1)
>>> arr3 == arr4
rec.array([ True],
          dtype=bool)
>>> bool(arr3 == arr4)
True

Note, in any case where the resulting array has a .size greater than 1, then you will get this infamous error if you try to evaluate the truth value, so:

>>> np.array([1, 1]) == np.array([1, 1])
array([ True,  True], dtype=bool)
>>> bool(np.array([1, 1]) == np.array([1, 1]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
>>>
like image 126
juanpa.arrivillaga Avatar answered Jan 05 '23 17:01

juanpa.arrivillaga


@juanpa.arrivillaga is correct. But in addition you should note that it is best to do testing on NumPy arrays using the numpy.testing module. For Example:

np.testing.assert_equal(
    np.array(
        [],
        dtype=[
            ('time', 'datetime64[ns]'),
            ('end_time', int)
        ]
    ).view(np.recarray),
    np.array(
        [],
        dtype=[
            ('time', 'datetime64[ns]'),
            ('end_time', int)
        ]
    ).view(np.recarray)
)
like image 45
Grr Avatar answered Jan 05 '23 16:01

Grr