Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy ndarray hashability

Tags:

python

numpy

I have some problems understanding how numpy objects hashability is managed.

>>> import numpy as np
>>> class Vector(np.ndarray):
...     pass
>>> nparray = np.array([0.])
>>> vector = Vector(shape=(1,), buffer=nparray)
>>> ndarray = np.ndarray(shape=(1,), buffer=nparray)
>>> nparray
array([ 0.])
>>> ndarray
array([ 0.])
>>> vector
Vector([ 0.])
>>> '__hash__' in dir(nparray)
True
>>> '__hash__' in dir(ndarray)
True
>>> '__hash__' in dir(vector)
True
>>> hash(nparray)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'numpy.ndarray'
>>> hash(ndarray)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'numpy.ndarray'
>>> hash(vector)
-9223372036586049780
>>> nparray.__hash__()
269709177
>>> ndarray.__hash__()
269702147
>>> vector.__hash__()
-9223372036586049780
>>> id(nparray)
4315346832
>>> id(ndarray)
4315234352
>>> id(vector)
4299616456
>>> nparray.__hash__() == id(nparray)
False
>>> ndarray.__hash__() == id(ndarray)
False
>>> vector.__hash__() == id(vector)
False
>>> hash(vector) == vector.__hash__()
True

How come

  • numpy objects define a __hash__ method but are however not hashable
  • a class deriving numpy.ndarray defines __hash__ and is hashable?

Am I missing something?

I'm using Python 2.7.1 and numpy 1.6.1

Thanks for any help!

EDIT: added objects ids

EDIT2: And following deinonychusaur comment and trying to figure out if hashing is based on content, I played with numpy.nparray.dtype and have something I find quite strange:

>>> [Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype) for mytype in ('float', 'int', 'float128')]
[Vector([ 1.]), Vector([1]), Vector([ 1.0], dtype=float128)]
>>> [id(Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype)) for mytype in ('float', 'int', 'float128')]
[4317742576, 4317742576, 4317742576]
>>> [hash(Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype)) for mytype in ('float', 'int', 'float128')]
[269858911, 269858911, 269858911]

I'm puzzled... is there some (type independant) caching mechanism in numpy?

like image 407
marchelbling Avatar asked Mar 20 '12 11:03

marchelbling


People also ask

How do I fix Unhashable type Numpy Ndarray?

We can solve this by adding each array element instead of the array object into the set. This should add all the elements of the array to the set.

Why is Numpy array not hashable?

It doesn't make the array hashable, for multiple reasons. The first reason is that an array with the writeable flag set to False is still mutable. First, you can always set writeable=True again and resume writing to it, or do more exotic things like reassign its shape even while writeable is False .

Is an array hashable?

Many types in the standard library conform to Hashable : Strings, integers, floating-point and Boolean values, and even sets are hashable by default. Some other types, such as optionals, arrays and ranges automatically become hashable when their type arguments implement the same.

How do I change Ndarray to list?

To convert a NumPy array (ndarray) to a Python list use ndarray. tolist() function, this doesn't take any parameters and returns a python list for an array. While converting to a list, it converts the items to the nearest compatible built-in Python type.


1 Answers

I get the same results in Python 2.6.6 and numpy 1.3.0. According to the Python glossary, an object should be hashable if __hash__ is defined (and is not None), and either __eq__ or __cmp__ is defined. ndarray.__eq__ and ndarray.__hash__ are both defined and return something meaningful, so I don't see why hash should fail. After a quick google, I found this post on the python.scientific.devel mailing list, which states that arrays have never been intended to be hashable - so why ndarray.__hash__ is defined, I have no idea. Note that isinstance(nparray, collections.Hashable) returns True.

EDIT: Note that nparray.__hash__() returns the same as id(nparray), so this is just the default implementation. Maybe it was difficult or impossible to remove the implementation of __hash__ in earlier versions of python (the __hash__ = None technique was apparently introduced in 2.6), so they used some kind of C API magic to achieve this in a way that wouldn't propagate to subclasses, and wouldn't stop you from calling ndarray.__hash__ explicitly?

Things are different in Python 3.2.2 and the current numpy 2.0.0 from the repo. The __cmp__ method no longer exists, so hashability now requires __hash__ and __eq__ (see Python 3 glossary). In this version of numpy, ndarray.__hash__ is defined, but it is just None, so cannot be called. hash(nparray) fails andisinstance(nparray, collections.Hashable) returns False as expected. hash(vector) also fails.

like image 64
James Avatar answered Sep 17 '22 22:09

James