I feel like I'm way overthinking this problem, but here goes anyway...
I have a hash table with M slots in its internal array. I need to insert N elements into the hash table. Assuming that I have a hash function that randomly inserts am element into a slot with equal probability for each slot, what's the expected value of the total number of hash collisions?
(Sorry that this is more of a math question than a programming question).
Edit: Here's some code I have to simulate it using Python. I'm getting numerical answers, but having trouble generalizing it to a formula and explaining it.
import random
import pdb
N = 5
M = 8
NUM_ITER = 100000
def get_collisions(table):
col = 0
for item in table:
if item > 1:
col += (item-1)
return col
def run():
table = [0 for x in range(M)]
for i in range(N):
table[int(random.random() * M)] += 1
#print table
return get_collisions(table)
# Main
total = 0
for i in range(NUM_ITER):
total += run()
print float(total)/NUM_ITER
The formula for the SUM(x*(x+1)/2) metric can be found here, and the expected value appears to be (n/2m)* (n+2m -1) .
For all elements to collide, the elements should be equal to twice the total number of hash values. If we hash M values and total possible hash values is T, then the expected number of collisions will be C = M * (M-1) / 2T.
The probability of just two hashes accidentally colliding is approximately: 1.47*10-29. SHA256: The slowest, usually 60% slower than md5, and the longest generated hash (32 bytes). The probability of just two hashes accidentally colliding is approximately: 4.3*10-60.
b) Your hash function generates an n-bit output and you hash m randomly selected messages. The teacher's only answered a) like so: We expect to find one collision every 2n/2 hashes.
You'll find the answer here: Quora.com. The expected number of collisions for m buckets and n inserts is
n - m * (1 - ((m-1)/m)^n)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With