Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Truncating an md5 hash, How do I calculate the odds of a collision occurring?

Tags:

md5

I want to truncate an md5 hash to about half size. How much does that increase the odds of collisions? if I'm dealing with around 500 000 generations, should I be worried about a collision? what about 1m generations.

like image 678
dc. Avatar asked Feb 13 '10 04:02

dc.


People also ask

What are the odds of an MD5 hash collision?

MD5: The fastest and shortest generated hash (16 bytes). The probability of just two hashes accidentally colliding is approximately: 1.47*10-29.

How do you find the probability of a hash collision?

For all elements to collide, the elements should be equal to twice the total number of hash values. If we hash M values and total possible hash values is T, then the expected number of collisions will be C = M * (M-1) / 2T.

Can you truncate a hash?

Yes. Truncating the output of a hash function always decreases its (theoretical) collision-resistance. In practice, it usually doesn't matter too much; for instance, 280 time is still pretty big. Still, if you used the full output of SHA-256, the same birthday attack would take 2128 time, which is totally out of reach.

How do you make a hash collision?

A hash collision is created when we take two different inputs of data, and then create the same hash. One way of doing with is to search for two data elements and add random data in order to find the same hash.


2 Answers

The math you're looking for is on Wikipedia's birthday attack page.

We consider the following experiment. From a set of H values we choose n values uniformly at random thereby allowing repetitions. Let p(n; H) be the probability that during this experiment at least one value is chosen more than once. This probability can be approximated as

p(n;H) ~= 1-e^(-n^2/(2H))

With 128 bits the chance of a collision among 500,000 hash values is around 10-28. If you halve the size of the collision space then the chance of collision is around 10-9. That is, even though the chance is vastly greater it's still very, very low. It depends on how critical it is that there be no collisions. 10-9 is on the order of one in a billion, so while extremely unlikely it's within the realm of possibility.

For reference:

1028 = 10 octillion = 10 billion billion billion
109 = 1 billion

like image 103
John Kugelman Avatar answered Oct 11 '22 02:10

John Kugelman


There's an interesting mathematical problem called the birthday problem that deals with that kind of situation. The fact is that the more entries you push in, the higher the chances to have a collision.

Following the table posted on the above link, assuming your digests are 64 bits each (since a single MD5 hash is 128 bits) and that MD5 have a uniform distribution, there is a very low chance that two hashes will collide. It becomes significant (1% chance or more) at 610,000,000 entries.

like image 44
zneak Avatar answered Oct 11 '22 03:10

zneak