Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the chances that two messages have the same MD5 digest and the same SHA1 digest?

Given two different messages, A and B (maybe 20-80 characters of text, if size matters at all), what is the probability that the MD5 digest of A is the same as the MD5 digest of B and the SHA1 digest of A is the same as the SHA1 digest of B? That is:

(MD5(A) == MD5(B)) && (SHA1(A) == SHA1(B)) 

Assume no malicious intent, i.e., that the messages are not selected with an aim of finding a clash. I just want to know the odds of this happening naturally.

I'm thinking the chances are "astronomically low," but I'm not sure how to verify this.

More information: the size of the pool of possible messages is restricted, but large (several hundred million). Birthday paradox situations are exactly what I'm worried about.

like image 806
John Siracusa Avatar asked Aug 24 '09 15:08

John Siracusa


People also ask

Can 2 MD5 hashes be the same?

Generally, two files can have the same md5 hash only if their contents are exactly the same. Even a single bit of variation will generate a completely different hash value. There is one caveat, though: An md5 sum is 128 bits (16 bytes).

Can two files have the same SHA1 hash?

No they don't. If you think about it, sha1 output has 160 bits. There are more than 2^160 possible files, therefore there must be multiple (infinitely many) potential files that have the same hashes.

Can 2 values have the same hash?

Two files can have the same MD5 hash even if there are different. As the MD5 algorithm can take an infinity of input and give a limited number of output, it's not impossible, even if the probability of collision is very low. So, you have the short answer now, let's take a look at an example and how to avoid this issue.


1 Answers

Assuming uniform spread in the range of MD5 and SHA-1 hashes for random strings (which isn't the case), and assuming we're only talking about two strings and not talking about a pool of strings (so we avoid birthday-paradox-type complexities):

An MD5 hash is 128 bits wide, and SHA-1's is 160. With the above assumptions, two strings A and B have a probability of colliding P if both hashes collide. So

P(both collide) = P(MD5 collides) * P(SHA-1 collides) 

And

P(MD5 collides) = 1/(2^128) P(SHA-1 collides) = 1/(2^160) 

So

P(both) = 2^-128 * 2^-160 = 2^-288 ~= 2.01 x 10^-87 

Again, if you have a pool of strings and you're trying to determine the probabilities of collisions with the pool, you're in the domain of the birthday paradox and this probability I've calculated here doesn't apply. That and hashes aren't as uniform as they should be. In reality you're going to have a much higher collision rate, but it will still be tiny.


EDIT

Since you are dealing with a birthday paradox situation, apply the same logic as the solution to the birthday paradox. Let's look at it from the point of view of just one hash function:

N := the number of hashes in your pool (several hundred million) S := the size of your hash space (2^288) Therefore, P(There are no collisions) = (S!)/(S^N * (S - N)!) 

Let's pretend we have a nice even number of hashes like 2^29 (roughly 530 million).

P = (2^288!)/(2^288^(2^29) * (2^288 - 2^29)!) 

In short, I don't even want to think about calculating this number. I'm not even sure how you can go about estimating it. You'll at least need an arbitrary-precision calculator that can handle huge factorials without dying.

Note that the probabilities will follow a curve that starts at nearly 0 when N = 1 or 2, and it will reach 1 when N >= 2^288, similar in shape to the one on the Wikipedia page for the birthday paradox.

The birthday paradox reaches P = .5 when N = 23. In other words, the probability of a collision is 50% when N is 6% of S. If that scales (I'm not sure if it does), it means that there will be a 50% chance of a collision when you have 6% of 2^288 hashes. 6% of 2^288 is around 2^284. Your value of N (several hundred million) is nowhere near that. It's practically insignificant compared to your S, so I don't think you have anything to worry about. Collisions aren't very likely.

like image 143
Welbog Avatar answered Oct 02 '22 15:10

Welbog