Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How many random elements before MD5 produces collisions?

Tags:

random

hash

md5

I've got an image library on Amazon S3. For each image, I md5 the source URL on my server plus a timestamp to get a unique filename. Since S3 can't have subdirectories, I need to store all of these images in a single flat folder.

Do I need to worry about collisions in the MD5 hash value that gets produced?

Bonus: How many files could I have before I'd start seeing collisions in the hash value that MD5 produces?

like image 834
Ben Throop Avatar asked Oct 16 '22 13:10

Ben Throop


People also ask

How likely is a MD5 collision?

MD5: The fastest and shortest generated hash (16 bytes). The probability of just two hashes accidentally colliding is approximately: 1.47*10-29.

Is MD5 a collision?

In 2004 it was shown that MD5 is not collision-resistant. As such, MD5 is not suitable for applications like SSL certificates or digital signatures that rely on this property for digital security.

What is the expected number of hash collisions?

The expected number of collisions (assuming that the hash function can be modeled as a random function) is precisely 2−n(m2); that is, the expected number of pairs of values x≠y with H(x)=H(y) (and so, to answer Ricky's question, H(x)=H(y)=H(z) would count as three collisions).

How many MD5 hashes are there?

For example, the MD5 hash is always 128 bits long (commonly represented as 16 hexadecimal bytes). Thus, there are 2^128 possible MD5 hashes.


1 Answers

Probability of just two hashes accidentally colliding is 1/2128 which is 1 in 340 undecillion 282 decillion 366 nonillion 920 octillion 938 septillion 463 sextillion 463 quintillion 374 quadrillion 607 trillion 431 billion 768 million 211 thousand 456.

However if you keep all the hashes then the probability is a bit higher thanks to birthday paradox. To have a 50% chance of any hash colliding with any other hash you need 264 hashes. This means that to get a collision, on average, you'll need to hash 6 billion files per second for 100 years.

like image 317
Kornel Avatar answered Oct 18 '22 03:10

Kornel