In [30]: import numpy as np
In [31]: d = np.dtype(np.float64)
In [32]: d
Out[32]: dtype('float64')
In [33]: d == np.float64
Out[33]: True
In [34]: hash(np.float64)
Out[34]: -9223372036575774449
In [35]: hash(d)
Out[35]: 880835502155208439
Why do these dtypes compare equal but hash different?
Note that Python does promise that:
The only required property is that objects which compare equal have the same hash value…
My workaround for this problem is to call np.dtype
on everything, after which hash values and comparisons are consistent.
As tttthomasssss
notes, the type
(class) for np.float64
and d
are different. They are different kinds of things:
In [435]: type(np.float64)
Out[435]: type
Type type
means (usually) that it is a function, so it can be used as:
In [436]: np.float64(0)
Out[436]: 0.0
In [437]: type(_)
Out[437]: numpy.float64
creating a numeric object. Actually that looks more like a class definition. But since numpy
uses a lot of compiled code, and its ndarray
uses its own __new__
, I wouldn't be surprised if it straddles the line.
In [438]: np.float64.__hash__??
Type: wrapper_descriptor
String Form:<slot wrapper '__hash__' of 'float' objects>
Docstring: x.__hash__() <==> hash(x)
I was thinking this would the hash(np.float64)
, but it might actually be the hash for an object of that type, e.g. hash(np.float64(0))
. In that case hash(np.float64)
just uses the default type.__hash__
method.
Moving on to the dtype
:
In [439]: d=np.dtype(np.float64)
In [440]: type(d)
Out[440]: numpy.dtype
d
is not a function or class:
In [441]: d(0)
...
TypeError: 'numpy.dtype' object is not callable
In [442]: d.__hash__??
Type: method-wrapper
String Form:<method-wrapper '__hash__' of numpy.dtype object at 0xb60f8a60>
Docstring: x.__hash__() <==> hash(x)
Looks like np.dtype
does not define any special __hash__
method, it just inherits from object
.
Further illustrating the difference between float64
and d
, look at the class inheritance stack
In [443]: np.float64.__mro__
Out[443]:
(numpy.float64,
numpy.floating,
numpy.inexact,
numpy.number,
numpy.generic,
float,
object)
In [444]: d.__mro__
...
AttributeError: 'numpy.dtype' object has no attribute '__mro__'
In [445]: np.dtype.__mro__
Out[445]: (numpy.dtype, object)
So np.float64
doesn't define a hash either, it just inherits from float
. d
doesn't have an __mro__
because it's an object, not a class.
numpy
has enough compiled code, and a long history of its own, that you can't count on Python documentation always applying.
np.dtype
and np.float64
evidently have __eq__
methods that allow them to be compared with each other, but numpy
developers did not put any effort into making sure that the __hash__
methods comply. Most likely because they don't need to use either as a dictionary key.
I've never seen code like:
In [453]: dd={np.float64:12,d:34}
In [454]: dd
Out[454]: {dtype('float64'): 34, numpy.float64: 12}
In [455]: dd[np.float64]
Out[455]: 12
In [456]: dd[d]
Out[456]: 34
They shouldn't behave this way, but __eq__
and __hash__
for numpy.dtype
objects are broken on an essentially unfixable design level. I'll be pulling heavily from njsmith's comments on a dtype-related bug report for this answer.
np.float64
isn't actually a dtype. It's a type, in the ordinary sense of the Python type system. Specifically, if you retrieve a scalar from an array of float64 dtype, np.float64
is the type of the resulting scalar.
np.dtype(np.float64)
is a dtype, an instance of numpy.dtype
. dtypes are how NumPy records the structure of the contents of a NumPy array. They are particularly important for structured arrays, which can have very complex dtypes. While ordinary Python types could have filled much of the role of dtypes, creating new types on the fly for new structured arrays would be highly awkward, and it would probably have been impossible in the days before type-class unification.
numpy.dtype
implements __eq__
basically like this:
def __eq__(self, other):
if isinstance(other, numpy.dtype):
return regular_comparison(self, other)
return self == numpy.dtype(other)
which is pretty broken. Among other problems, it's not transitive, it raises TypeError
when it should return NotImplemented
, and its output is really bizarre at times because of how dtype coercion works:
>>> x = numpy.dtype(numpy.float64)
>>> x == None
True
numpy.dtype.__hash__
isn't any better. It makes no attempt to be consistent with the __hash__
methods of all the other types numpy.dtype.__eq__
accepts (and with so many incompatible types to deal with, how could it?). Heck, it shouldn't even exist, because dtype objects are mutable! Not just mutable like modules or file objects, where it's okay because __eq__
and __hash__
work by identity. dtype objects are mutable in ways that will actually change their hash value:
>>> x = numpy.dtype([('f1', float)])
>>> hash(x)
-405377605
>>> x.names = ['f2']
>>> hash(x)
1908240630
When you try to compare d == np.float64
, d.__eq__
builds a dtype out of np.float64
and finds that d == np.dtype(np.float64)
is True. When you take their hashes, though, np.float64
uses the regular (identity-based) hash for type objects and d
uses the hash for dtype objects. Normally, equal objects of different types should have equal hashes, but the dtype implementation doesn't care about that.
Unfortunately, it's impossible to fix the problems with dtype __eq__
and __hash__
without breaking APIs people are relying on. People are counting on things like x.dtype == 'float64'
or x.dtype == np.float64
, and fixing dtypes would break that.
They are not the same thing, while np.float64
is a type
, d
is an instance of numpy.dtype
, hence they hash to different values, but all instances of d
created the same way will hash to the same value because they are identical (which of course does not necessarily mean they point to the same memory location).
Edit:
Given your code above you can try the following:
In [72]: type(d)
Out[72]: numpy.dtype
In [74]: type(np.float64)
Out[74]: type
which shows you that the two are of different type and hence will hash to different values. Showing that different instances of numpy.dtype
can be shown by the following example:
In [77]: import copy
In [78]: dd = copy.deepcopy(d) # Try copying
In [79]: dd
Out[79]: dtype('float64')
In [80]: hash(dd)
Out[80]: -6584369718629170405
In [81]: hash(d) # original d
Out[81]: -6584369718629170405
In [82]: ddd = np.dtype(np.float64) # new instance
In [83]: hash(ddd)
Out[83]: -6584369718629170405
# If using CPython, id returns the address in memory (see: https://docs.python.org/3/library/functions.html#id)
In [84]: id(ddd)
Out[84]: 4376165768
In [85]: id(dd)
Out[85]: 4459249168
In [86]: id(d)
Out[86]: 4376165768
Its nice to see that ddd
(the instance created the same way as d
), and d
itself share the same object in memory, but dd
(the copied object) uses a different address.
The equality checks evaluate as you would expect, given the hashes above:
In [87]: dd == np.float64
Out[87]: True
In [88]: d == np.float64
Out[88]: True
In [89]: ddd == np.float64
Out[89]: True
In [90]: d == dd
Out[90]: True
In [91]: d == ddd
Out[91]: True
In [92]: dd == ddd
Out[92]: True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With