Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MD5 hash with different results

Tags:

Im trying to encode some chains to MD5 but I have noticed that:

For the chain: "123456çñ"

Some webs like

http://www.md5.net

www.md5.cz

md5generator.net

return: "66f561bb6b68372213dd9768e55e1002"

And others like:

http://www.adamek.biz/md5-generator.php

7thspace.com/webmaster_tools/online_md5_encoder.html

md5.rednoize.com/

return: "9e6c9a1eeb5e00fbf4a2cd6519e0cfcb"

I'd need to encode the chains with standar md5 because I need to connect my results with other systems. which hash is the correct?

Thanks in advance

like image 893
Encripterrr Avatar asked Jul 27 '11 05:07

Encripterrr


People also ask

What is the MD5 hash of the UTF-8 encoded data?

So, when we compute MD5 hash of the UTF-8 encoded data, we get the first result. Here, we see the Unicode code points of 'ç' and 'ñ'. So, when we compute MD5 hash of the data represented with the Unicode code points of each character in the string (possibly ISO-8859-1 encoded), we get the second result.

Are there two strings that have the same MD5 hash value?

Are there two known strings which have the same MD5 hash value? Bookmark this question. Show activity on this post. Is there an example of two known strings which have the same MD5 hash value (representing a so-called "MD5 collision")? Show activity on this post. Yes you can, see at the MD5 Collision Demo, the two blocks: produce an MD5 collision.

Is it possible to make a character string MD5-hashed?

Yes, of course: MD5 hashes have a finite length, but there are an infinite number of possible character strings that can be MD5-hashed. Yes, it is possible. It is called a Hash collision. Having said that, algorithms such as MD5 are designed to minimize the probability of a collision.

What's the difference between birthday paradox and MD5 hash?

Birthday paradox is certainly probability confidence math trick result possibility of 365 options or days, while hash is from how much? Much more. So if you have 2 different matching string, its just because MD5 hash is too short for too many files, so use something longer then MD5.


2 Answers

The problem I guess is in different text encodings. The string you show can't be represented in ANSI encoding - it requires UTF-16 or UTF-8. The choice of one of the latter leads to different byte representation of the string and that produces different hashes.

Remember, MD5 hashes bytes, not characters - it's up to you how to encode those characters as bytes before feeding bytes to MD5. If you want to interoperate with other systems you have to use the same encoding as those systems.

like image 58
sharptooth Avatar answered Oct 07 '22 17:10

sharptooth


Let us use Python to understand this.

>>> '123456çñ'
'123456\xc3\xa7\xc3\xb1'
>>> 'ç'
'\xc3\xa7'
>>> 'ñ'
'\xc3\xb1'

In the above output, we see the UTF-8 encoding of 'ç' and 'ñ'.

>>> md5('123456çñ').digest().encode('hex')
'66f561bb6b68372213dd9768e55e1002'

So, when we compute MD5 hash of the UTF-8 encoded data, we get the first result.

>>> u'ç'
u'\xe7'
>>> u'ñ'
u'\xf1'

Here, we see the Unicode code points of 'ç' and 'ñ'.

>>> md5('123456\xe7\xf1').digest().encode('hex')
'9e6c9a1eeb5e00fbf4a2cd6519e0cfcb'

So, when we compute MD5 hash of the data represented with the Unicode code points of each character in the string (possibly ISO-8859-1 encoded), we get the second result.

So, the first website is computing the hash of the UTF-8 encoded data while the second one is not.

like image 41
Susam Pal Avatar answered Oct 07 '22 18:10

Susam Pal