Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate the odds of a collision in hash algorithms?

Tags:

Say I have a hash algorithm, and it's nice and smooth (The odds of any one hash value coming up are the same as any other value).

Now say that I know that the odds of picking 2 hashes and there being a collision are (For arguments sake) 50000:1.

Now say I pick 100 hashes. How do I calculate the odds of a collision within that set of 100 values, given the odds of a collision in a set of 2?

What is the general solution to this, so that I can come up with a number of hash attempts after which the odds fall below some acceptable threshold? E.g. I can say things like "A batch of 49999 hash value creations has a high chance of collision".

like image 517
izb Avatar asked Mar 25 '09 14:03

izb


People also ask

How do you find the probability of a collision?

The net probability of collision is derived by multiplying them together. The method for calculating probability of collision is appropriate for most encounters that occur in near-circular orbits. 1r + σ2 2r.

What is the probability of finding a collision for an ideal 60 bit hash function?

Conclusions. So the probability of no collisions is exp(-1/2) or about 60%, which means there's a 40% chance of at least one collision. As a rule of thumb, a hash function with range of size N can hash on the order of √N values before running into collisions.

How do you find the collision of a hash table?

One method for resolving collisions looks into the hash table and tries to find another open slot to hold the item that caused the collision. A simple way to do this is to start at the original hash value position and then move in a sequential manner through the slots until we encounter the first slot that is empty.

How likely is a collision in SHA256?

SHA256: The slowest, usually 60% slower than md5, and the longest generated hash (32 bytes). The probability of just two hashes accidentally colliding is approximately: 4.3*10-60. As you can see, the slower and longer the hash is, the more reliable it is.


2 Answers

This is a generalization of the Birthday problem.

like image 151
Pesto Avatar answered Oct 12 '22 10:10

Pesto


First calculate the probability that there is not a collision:

hashes_picked = 100
single_collision_odds = 50000

# safe_combinations is number of ways to pick hashes that don't overlap
safe_combinations = factorial(single_collision_odds) / factorial(single_collision_odds - hashes_picked)

# all_combinations is total number of ways to pick hashes
all_combinations = single_collision_odds ** hashes_picked   

collision_chance = (all_combinations - safe_combinations) / all_combinations
like image 41
recursive Avatar answered Oct 12 '22 11:10

recursive