Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When is hash(n) == n in Python?

I've been playing with Python's hash function. For small integers, it appears hash(n) == n always. However this does not extend to large numbers:

>>> hash(2**100) == 2**100 False 

I'm not surprised, I understand hash takes a finite range of values. What is that range?

I tried using binary search to find the smallest number hash(n) != n

>>> import codejamhelpers # pip install codejamhelpers >>> help(codejamhelpers.binary_search) Help on function binary_search in module codejamhelpers.binary_search:  binary_search(f, t)     Given an increasing function :math:`f`, find the greatest non-negative integer :math:`n` such that :math:`f(n) \le t`. If :math:`f(n) > t` for all :math:`n \ge 0`, return None.  >>> f = lambda n: int(hash(n) != n) >>> n = codejamhelpers.binary_search(f, 0) >>> hash(n) 2305843009213693950 >>> hash(n+1) 0 

What's special about 2305843009213693951? I note it's less than sys.maxsize == 9223372036854775807

Edit: I'm using Python 3. I ran the same binary search on Python 2 and got a different result 2147483648, which I note is sys.maxint+1

I also played with [hash(random.random()) for i in range(10**6)] to estimate the range of hash function. The max is consistently below n above. Comparing the min, it seems Python 3's hash is always positively valued, whereas Python 2's hash can take negative values.

like image 208
Colonel Panic Avatar asked Jun 03 '16 10:06

Colonel Panic


People also ask

What is hash () in Python used for?

What is Hash Method in Python? Hash method in Python is a module that is used to return the hash value of an object. In programming, the hash method is used to return integer values that are used to compare dictionary keys using a dictionary look up feature.

How do you create a hash function?

With modular hashing, the hash function is simply h(k) = k mod m for some m (usually, the number of buckets). The value k is an integer hash code generated from the key. If m is a power of two (i.e., m=2p), then h(k) is just the p lowest-order bits of k.


1 Answers

2305843009213693951 is 2^61 - 1. It's the largest Mersenne prime that fits into 64 bits.

If you have to make a hash just by taking the value mod some number, then a large Mersenne prime is a good choice -- it's easy to compute and ensures an even distribution of possibilities. (Although I personally would never make a hash this way)

It's especially convenient to compute the modulus for floating point numbers. They have an exponential component that multiplies the whole number by 2^x. Since 2^61 = 1 mod 2^61-1, you only need to consider the (exponent) mod 61.

See: https://en.wikipedia.org/wiki/Mersenne_prime

like image 135
Matt Timmermans Avatar answered Oct 21 '22 07:10

Matt Timmermans