Say I have a hash algorithm, and it's nice and smooth (The odds of any one hash value coming up are the same as any other value).
Now say that I know that the odds of picking 2 hashes and there being a collision are (For arguments sake) 50000:1.
Now say I pick 100 hashes. How do I calculate the odds of a collision within that set of 100 values, given the odds of a collision in a set of 2?
What is the general solution to this, so that I can come up with a number of hash attempts after which the odds fall below some acceptable threshold? E.g. I can say things like "A batch of 49999 hash value creations has a high chance of collision".
The net probability of collision is derived by multiplying them together. The method for calculating probability of collision is appropriate for most encounters that occur in near-circular orbits. 1r + σ2 2r.
Conclusions. So the probability of no collisions is exp(-1/2) or about 60%, which means there's a 40% chance of at least one collision. As a rule of thumb, a hash function with range of size N can hash on the order of √N values before running into collisions.
One method for resolving collisions looks into the hash table and tries to find another open slot to hold the item that caused the collision. A simple way to do this is to start at the original hash value position and then move in a sequential manner through the slots until we encounter the first slot that is empty.
SHA256: The slowest, usually 60% slower than md5, and the longest generated hash (32 bytes). The probability of just two hashes accidentally colliding is approximately: 4.3*10-60. As you can see, the slower and longer the hash is, the more reliable it is.
This is a generalization of the Birthday problem.
First calculate the probability that there is not a collision:
hashes_picked = 100
single_collision_odds = 50000
# safe_combinations is number of ways to pick hashes that don't overlap
safe_combinations = factorial(single_collision_odds) / factorial(single_collision_odds - hashes_picked)
# all_combinations is total number of ways to pick hashes
all_combinations = single_collision_odds ** hashes_picked
collision_chance = (all_combinations - safe_combinations) / all_combinations
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With