I would like to map values returned by Python's hash()
function to floats in the range 0 to 1. On my system I can do this with
scale = 1.0/(2**64)
print hash(some_object)*scale+0.5
However, I know this will be different on 32-bit systems. Most likely I will never run this code anywhere else, but still I would like to know if there's a way to programmatically determine the maximum and minimum values that Python's built-in hash()
function can return.
(By the way the reason I'm doing this is that I'm developing a numerical simulation in which I need to consistently generate the same pseudo-random number from a given Numpy array. I know the built-in hash won't have the best statistics for this, but it's fast, so it's convenient to use it for testing purposes.)
In Python 2.7 hash()
returns an int
, so sys.maxint
should give you an idea of its range.
This is not really an answer to your main question, but an answer to your fine print. numpy
RNG takes numpy
arrays as seeds (hashing them internally):
>>> import numpy
>>> a = numpy.arange(1000)
>>> b = a.copy()
>>> b[-1] = 0
>>> r1 = numpy.random.RandomState(a)
>>> r2 = numpy.random.RandomState(b)
>>> r3 = numpy.random.RandomState(a)
>>> r1.rand()
0.9343370187421804
>>> r3.rand()
0.9343370187421804
>>> r2.rand()
0.4651506189783071
hash()
calls the __hash__
hook on the object passed in. That hook should return an integer.
Because Python int
are only limited in size by memory, theoretically there is no real upper limit to the values that hash()
can return.
If you want to trace how Python objects implement this, search for the tp_hash
slot in the Objects/
directory, or look for the PyObject_Hash
function calls to see how the value of those slots is used by sets and dictionaries and other code.
CPython long integer objects themselves limit the return value to a C long int.
Interally, the CPython type
tp_hash
function will cast any value returned from a Python __hash__
function that is greater that falls outside the range for a C long int to the Python long int hash for that value; so a hash value greater than sys.maxint
will be transformed by calling hash()
on that value again.
So in practice, hash()
should return values limited to sys.maxint
.
In Python 3, a new type was introduced, Py_hash_t
; C long is, on some 64-bit platforms, still limited to only 32 bits, but Py_hash_t
is the same size as a pointer, giving you 64 bits on any 64-bit platform. On Python 3, the sys.maxsize
value reflects the maximum correctly; it returns the maximum value a pointer on your platform can hold.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With