I have some problems understanding how numpy objects hashability is managed.
>>> import numpy as np
>>> class Vector(np.ndarray):
... pass
>>> nparray = np.array([0.])
>>> vector = Vector(shape=(1,), buffer=nparray)
>>> ndarray = np.ndarray(shape=(1,), buffer=nparray)
>>> nparray
array([ 0.])
>>> ndarray
array([ 0.])
>>> vector
Vector([ 0.])
>>> '__hash__' in dir(nparray)
True
>>> '__hash__' in dir(ndarray)
True
>>> '__hash__' in dir(vector)
True
>>> hash(nparray)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'numpy.ndarray'
>>> hash(ndarray)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'numpy.ndarray'
>>> hash(vector)
-9223372036586049780
>>> nparray.__hash__()
269709177
>>> ndarray.__hash__()
269702147
>>> vector.__hash__()
-9223372036586049780
>>> id(nparray)
4315346832
>>> id(ndarray)
4315234352
>>> id(vector)
4299616456
>>> nparray.__hash__() == id(nparray)
False
>>> ndarray.__hash__() == id(ndarray)
False
>>> vector.__hash__() == id(vector)
False
>>> hash(vector) == vector.__hash__()
True
How come
__hash__
method but are however not hashablenumpy.ndarray
defines __hash__
and is hashable?Am I missing something?
I'm using Python 2.7.1 and numpy 1.6.1
Thanks for any help!
EDIT: added objects id
s
EDIT2:
And following deinonychusaur comment and trying to figure out if hashing is based on content, I played with numpy.nparray.dtype
and have something I find quite strange:
>>> [Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype) for mytype in ('float', 'int', 'float128')]
[Vector([ 1.]), Vector([1]), Vector([ 1.0], dtype=float128)]
>>> [id(Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype)) for mytype in ('float', 'int', 'float128')]
[4317742576, 4317742576, 4317742576]
>>> [hash(Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype)) for mytype in ('float', 'int', 'float128')]
[269858911, 269858911, 269858911]
I'm puzzled... is there some (type independant) caching mechanism in numpy?
We can solve this by adding each array element instead of the array object into the set. This should add all the elements of the array to the set.
It doesn't make the array hashable, for multiple reasons. The first reason is that an array with the writeable flag set to False is still mutable. First, you can always set writeable=True again and resume writing to it, or do more exotic things like reassign its shape even while writeable is False .
Many types in the standard library conform to Hashable : Strings, integers, floating-point and Boolean values, and even sets are hashable by default. Some other types, such as optionals, arrays and ranges automatically become hashable when their type arguments implement the same.
To convert a NumPy array (ndarray) to a Python list use ndarray. tolist() function, this doesn't take any parameters and returns a python list for an array. While converting to a list, it converts the items to the nearest compatible built-in Python type.
I get the same results in Python 2.6.6 and numpy 1.3.0. According to the Python glossary, an object should be hashable if __hash__
is defined (and is not None
), and either __eq__
or __cmp__
is defined. ndarray.__eq__
and ndarray.__hash__
are both defined and return something meaningful, so I don't see why hash
should fail. After a quick google, I found this post on the python.scientific.devel mailing list, which states that arrays have never been intended to be hashable - so why ndarray.__hash__
is defined, I have no idea. Note that isinstance(nparray, collections.Hashable)
returns True
.
EDIT: Note that nparray.__hash__()
returns the same as id(nparray)
, so this is just the default implementation. Maybe it was difficult or impossible to remove the implementation of __hash__
in earlier versions of python (the __hash__ = None
technique was apparently introduced in 2.6), so they used some kind of C API magic to achieve this in a way that wouldn't propagate to subclasses, and wouldn't stop you from calling ndarray.__hash__
explicitly?
Things are different in Python 3.2.2 and the current numpy 2.0.0 from the repo. The __cmp__
method no longer exists, so hashability now requires __hash__
and __eq__
(see Python 3 glossary). In this version of numpy, ndarray.__hash__
is defined, but it is just None
, so cannot be called. hash(nparray)
fails andisinstance(nparray, collections.Hashable)
returns False
as expected. hash(vector)
also fails.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With