Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can two different strings generate the same MD5 hash code?

For each of our binary assets we generate a MD5 hash. This is used to check whether a certain binary asset is already in our application. But is it possible that two different binary assets generate the same MD5 hash. So is it possible that two different strings generate the same MD5 hash?

like image 638
Lieven Cardoen Avatar asked Nov 18 '09 13:11

Lieven Cardoen


People also ask

Can two different strings have same MD5 hash?

Generally, two files can have the same md5 hash only if their contents are exactly the same. Even a single bit of variation will generate a completely different hash value. There is one caveat, though: An md5 sum is 128 bits (16 bytes).

Can two strings have same hash value?

Of course yes in theory. a SHA256 hash has 256 bits so 8 bytes. Since there are more strings (longer than 8 bytes) than 2**256, a collision is theoretically possible.

Can MD5 be duplicated?

Yes, there can be collisions, but the chances of that happening are so incredibly small that I wouldn't worry about it unless you were literally tracking many billions of pieces of content. Show activity on this post. If you're really afraid of accidental collisions just do both MD5 and SHA1 hashes and compare them.

Are MD5 hashes always the same?

Yes, MD5 always outputs the same given the same input. That's how it's used for passwords. You store the hash in the database, then when the user types their password in, it's hashed again and the two hashes are compared. NOTE: MD5 is not recommended for hashing passwords because it's cryptographically weak.


2 Answers

For a set of even billions of assets, the chances of random collisions are negligibly small -- nothing that you should worry about. Considering the birthday paradox, given a set of 2^64 (or 18,446,744,073,709,551,616) assets, the probability of a single MD5 collision within this set is 50%. At this scale, you'd probably beat Google in terms of storage capacity.

However, because the MD5 hash function has been broken (it's vulnerable to a collision attack), any determined attacker can produce 2 colliding assets in a matter of seconds worth of CPU power. So if you want to use MD5, make sure that such an attacker would not compromise the security of your application!

Also, consider the ramifications if an attacker could forge a collision to an existing asset in your database. While there are no such known attacks (preimage attacks) against MD5 (as of 2011), it could become possible by extending the current research on collision attacks.

If these turn out to be a problem, I suggest looking at the SHA-2 series of hash functions (SHA-256, SHA-384 and SHA-512). The downside is that it's slightly slower and has longer hash output.

like image 67
intgr Avatar answered Oct 18 '22 13:10

intgr


MD5 is a hash function – so yes, two different strings can absolutely generate colliding MD5 codes.

In particular, note that MD5 codes have a fixed length so the possible number of MD5 codes is limited. The number of strings (of any length), however, is definitely unlimited so it logically follows that there must be collisions.

like image 28
Konrad Rudolph Avatar answered Oct 18 '22 13:10

Konrad Rudolph