Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determining the range of values returned by Python's hash()

Tags:

python

hash

I would like to map values returned by Python's hash() function to floats in the range 0 to 1. On my system I can do this with

scale = 1.0/(2**64)
print hash(some_object)*scale+0.5

However, I know this will be different on 32-bit systems. Most likely I will never run this code anywhere else, but still I would like to know if there's a way to programmatically determine the maximum and minimum values that Python's built-in hash() function can return.

(By the way the reason I'm doing this is that I'm developing a numerical simulation in which I need to consistently generate the same pseudo-random number from a given Numpy array. I know the built-in hash won't have the best statistics for this, but it's fast, so it's convenient to use it for testing purposes.)

like image 496
N. Virgo Avatar asked Oct 02 '13 08:10

N. Virgo


3 Answers

In Python 2.7 hash() returns an int, so sys.maxint should give you an idea of its range.

like image 97
Nicola Musatti Avatar answered Nov 15 '22 15:11

Nicola Musatti


This is not really an answer to your main question, but an answer to your fine print. numpy RNG takes numpy arrays as seeds (hashing them internally):

>>> import numpy
>>> a = numpy.arange(1000)
>>> b = a.copy()
>>> b[-1] = 0
>>> r1 = numpy.random.RandomState(a)
>>> r2 = numpy.random.RandomState(b)
>>> r3 = numpy.random.RandomState(a)
>>> r1.rand()
0.9343370187421804
>>> r3.rand()
0.9343370187421804
>>> r2.rand()
0.4651506189783071
like image 28
fjarri Avatar answered Nov 15 '22 15:11

fjarri


hash() calls the __hash__ hook on the object passed in. That hook should return an integer.

Because Python int are only limited in size by memory, theoretically there is no real upper limit to the values that hash() can return.

If you want to trace how Python objects implement this, search for the tp_hash slot in the Objects/ directory, or look for the PyObject_Hash function calls to see how the value of those slots is used by sets and dictionaries and other code.

CPython long integer objects themselves limit the return value to a C long int.

Interally, the CPython type tp_hash function will cast any value returned from a Python __hash__ function that is greater that falls outside the range for a C long int to the Python long int hash for that value; so a hash value greater than sys.maxint will be transformed by calling hash() on that value again.

So in practice, hash() should return values limited to sys.maxint.

In Python 3, a new type was introduced, Py_hash_t; C long is, on some 64-bit platforms, still limited to only 32 bits, but Py_hash_t is the same size as a pointer, giving you 64 bits on any 64-bit platform. On Python 3, the sys.maxsize value reflects the maximum correctly; it returns the maximum value a pointer on your platform can hold.

like image 45
Martijn Pieters Avatar answered Nov 15 '22 15:11

Martijn Pieters