I've searched a lot for MD5 hash collision, but I've found binary examples only. I would like to find two UTF-8 strings, which have the same MD5 hash. Are there any, or does the collision only work for binary data?
It's definitely possible:
By that alone some of these collisions are bound to be valid UTF-8 strings, but they're extremely rare, since most of these will be just random binary garbage.
If you absolutely need to find such messages, I recommend using collision finder written by Patrick Stach, which should return pair of arbitrary messages within a few hours, or my attempt to improve it. The latter uses techniques presented in later papers by Wang (the first person to demonstrate examples of MD5 collisions), Lian, Sasaki, Yajima and Klima.
I think you could also use length extension attack to some extent, but it requires deeper understanding of what happens inside MD5.
There are UTF-8 collisions. By the nature of cryptographic hashes, finding them is intentionally difficult, even for a hash as broken as MD5.
You might search for MD5 Rainbow Tables, which can be used for password cracking, and hence for UTF-8 strings. As @alk pointed out, a brute force search is going to take a very long time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With