I've been playing with Python's hash function. For small integers, it appears <code>hash(n) == n</code> always. However this does not extend to large numbers: <pre class="prettyprint"><code>>>> hash(2**100) == 2**100 False </code></pre> I'm not surprised, I understand hash takes a finite range of values. What is that range? I tried using binary search to find the smallest number <code>hash(n) != n</code> <pre class="prettyprint"><code>>>> import codejamhelpers # pip install codejamhelpers >>> help(codejamhelpers.binary_search) Help on function binary_search in module codejamhelpers.binary_search: binary_search(f, t) Given an increasing function :math:`f`, find the greatest non-negative integer :math:`n` such that :math:`f(n) \le t`. If :math:`f(n) > t` for all :math:`n \ge 0`, return None. >>> f = lambda n: int(hash(n) != n) >>> n = codejamhelpers.binary_search(f, 0) >>> hash(n) 2305843009213693950 >>> hash(n+1) 0 </code></pre> What's special about 2305843009213693951? I note it's less than <code>sys.maxsize == 9223372036854775807</code> Edit: I'm using Python 3. I ran the same binary search on Python 2 and got a different result 2147483648, which I note is <code>sys.maxint+1</code> I also played with <code>[hash(random.random()) for i in range(10**6)]</code> to estimate the range of hash function. The max is consistently below n above. Comparing the min, it seems Python 3's hash is always positively valued, whereas Python 2's hash can take negative values.

<code>2305843009213693951</code> is <code>2^61 - 1</code>. It's the largest Mersenne prime that fits into 64 bits. If you have to make a hash just by taking the value mod some number, then a large Mersenne prime is a good choice -- it's easy to compute and ensures an even distribution of possibilities. (Although I personally would never make a hash this way) It's especially convenient to compute the modulus for floating point numbers. They have an exponential component that multiplies the whole number by <code>2^x</code>. Since <code>2^61 = 1 mod 2^61-1</code>, you only need to consider the <code>(exponent) mod 61</code>. See: https://en.wikipedia.org/wiki/Mersenne_prime

When is hash(n) == n in Python?

Tags:

python

python-3.x

python-internals

hash

python-2.7

I've been playing with Python's hash function. For small integers, it appears hash(n) == n always. However this does not extend to large numbers:

>>> hash(2**100) == 2**100 False

I'm not surprised, I understand hash takes a finite range of values. What is that range?

I tried using binary search to find the smallest number hash(n) != n

>>> import codejamhelpers # pip install codejamhelpers >>> help(codejamhelpers.binary_search) Help on function binary_search in module codejamhelpers.binary_search:  binary_search(f, t)     Given an increasing function :math:`f`, find the greatest non-negative integer :math:`n` such that :math:`f(n) \le t`. If :math:`f(n) > t` for all :math:`n \ge 0`, return None.  >>> f = lambda n: int(hash(n) != n) >>> n = codejamhelpers.binary_search(f, 0) >>> hash(n) 2305843009213693950 >>> hash(n+1) 0

What's special about 2305843009213693951? I note it's less than sys.maxsize == 9223372036854775807

Edit: I'm using Python 3. I ran the same binary search on Python 2 and got a different result 2147483648, which I note is sys.maxint+1

I also played with [hash(random.random()) for i in range(10**6)] to estimate the range of hash function. The max is consistently below n above. Comparing the min, it seems Python 3's hash is always positively valued, whereas Python 2's hash can take negative values.

208

asked Jun 03 '16 10:06

Colonel Panic

1 Answers

2305843009213693951 is 2^61 - 1. It's the largest Mersenne prime that fits into 64 bits.

If you have to make a hash just by taking the value mod some number, then a large Mersenne prime is a good choice -- it's easy to compute and ensures an even distribution of possibilities. (Although I personally would never make a hash this way)

It's especially convenient to compute the modulus for floating point numbers. They have an exponential component that multiplies the whole number by 2^x. Since 2^61 = 1 mod 2^61-1, you only need to consider the (exponent) mod 61.

See: https://en.wikipedia.org/wiki/Mersenne_prime

135

answered Oct 21 '22 07:10

Matt Timmermans

Related questions
                            
                                Testing email sending in Django [closed]
                            
                                django.core.exceptions.ImproperlyConfigured: Error loading MySQLdb module: No module named MySQLdb
                            
                                Loop that also accesses previous and next values
                            
                                Python, HTTPS GET with basic authentication
                            
                                Changing user agent on urllib2.urlopen
                            
                                How to install MySQLdb (Python data access library to MySQL) on Mac OS X?
                            
                                Using an SSH keyfile with Fabric
                            
                                Python list rotation [duplicate]
                            
                                IOError: [Errno 32] Broken pipe when piping: `prog.py | othercmd`
                            
                                Python: import module from another directory at the same level in project hierarchy
                            
                                How to reliably guess the encoding between MacRoman, CP1252, Latin1, UTF-8, and ASCII
                            
                                If range() is a generator in Python 3.3, why can I not call next() on a range?
                            
                                setuptools: package data folder location
                            
                                How to use Python type hints with Django QuerySet?
                            
                                Is there any difference between django.conf.settings and import settings?
                            
                                How to add an element to the beginning of an OrderedDict?
                            
                                Python Method overriding, does signature matter?
                            
                                Convert 2d numpy array into list of lists [duplicate]
                            
                                data type not understood
                            
                                Pandas Left Outer Join results in table larger than left table

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With