Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

probability of collision in MD5

Tags:

md5

Worst case, I have 180 million values in a cache(15 minute window before they go stale) and an MD5 has 2^128 values. What is my probability of a collision? or better yet, is there a web page somewhere to answer that question or a rough estimate thereof? That would rock so I know my chances.

like image 615
Dean Hiller Avatar asked Jan 20 '17 17:01

Dean Hiller


People also ask

Is MD5 collision resistant?

Overview of security issues In 2004 it was shown that MD5 is not collision-resistant. As such, MD5 is not suitable for applications like SSL certificates or digital signatures that rely on this property for digital security.

How do you find the probability of a hash collision?

For all elements to collide, the elements should be equal to twice the total number of hash values. If we hash M values and total possible hash values is T, then the expected number of collisions will be C = M * (M-1) / 2T.

What is collision in MD5?

A collision is when you find two files to have the same hash. The research published by Wang, Feng, Lai and Yu demonstrated that MD5 fails this third requirement since they were able to generate two different messages that have the same hash.

Does sha256 have collisions?

The Wikipedia page gives an estimate of the likelihood of a collision. If you run the numbers, you'll see that all harddisks ever produced on Earth can't hold enough 1MB files to get a likelihood of a collision of even 0.01% for SHA-256. Basically, you can simply ignore the possibility.


1 Answers

The probability is 1-m!/(mⁿ(m-n)!) where m = 2¹²⁸ and n = 180000000.

Running it through on-line Wolfram exceeds available computational time!

If you have SmallTalk installed locally, you can run this:

|m n p|

m := 2 raisedTo:128.
n := 180000000.
p := (1-(m factorial/((m raisedTo:n)*(m-n)factorial)))asFloat.

Transcript show:p printString;cr.

A search for the Birthday Problem brings up a Wikipedia page where they provide a table showing for 128 bits and 2.6×10¹⁰ hashes, the probability of a collision is 1 in 10¹⁸, so that’s for 140× the number of hashes than what you’re considering. So you know your odds are “worse” than this.

A good approximation if n ≪ m is 1-e-n2/2m, where if you plug in m and n above, you get 4.76×10⁻²³ or 1 in 2.10×10²² as the probability of a collision.

Even though the probability of a collision is very low, it is prudent in the FOOBAR case, say if there is an issue and the hashes accumulate for more than 15 minutes, to at least confirm what would happen in the event of a collision. This will also help if someone somehow injects duplicate hashes in order to try to compromise it.

like image 116
Yimin Rong Avatar answered Sep 18 '22 10:09

Yimin Rong