Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will the MD5 cryptographic hash function output be same in all programming languages?

Tags:

md5

I am basically creating an API in php, and one of the parameters that it will accept is an md5 encrypted value. I don't have much knowledge of different programming languages and also about the MD5. So my basic question is, if I am accepting md5 encrypted values, will the value remain same, generated from any programing language like .NET, Java, Perl, Ruby... etc.

Or there would be some limitation or validations for it.

like image 227
jtanmay Avatar asked Aug 10 '10 16:08

jtanmay


People also ask

Are all MD5 hashes the same?

Yes, MD5 checksums are platform agnostic and will produce the same value every time on the same file/string/whatever.

Does hash function produce same output?

There are several hashing algorithms accessible, but certain attributes must be found useful for just about any cryptographic hash function. Specific Hash Values: Always produce the same output from different inputs. It is difficult to produce the same hashed output from the distinct input text with the hash function.

What is the output of an MD5 hash?

The output is always 128 bits. Note that md5 is not an encryption algorithm, but a cryptographic hash. This means that you can use it to verify the integrity of a chunk of data, but you cannot reverse the hashing.

Is an MD5 hash unique?

If MD5 hashes any arbitrary string into a 32-digit hex value, then according to the Pigeonhole Principle surely this can not be unique, as there are more unique arbitrary strings than there are unique 32-digit hex values.


2 Answers

Yes, correct implementation of md5 will produce the same result, otherwise md5 would not be useful as a checksum. The difference may come up with encoding and byte order. You must be sure that text is encoded to exactly the same sequence of bytes.

like image 140
Andrey Avatar answered Sep 20 '22 17:09

Andrey


It will, but there's a but.

It will because it's spec'd to reliably produce the same result given a repeated series of bytes - the point being that we can then compare that results to check the bytes haven't changed, or perhaps only digitally sign the MD5 result rather than signing the entire source.

The but is that a common source of bugs is making assumptions about how strings are encoded. MD5 works on bytes, not characters, so if we're hashing a string, we're really hashing a particular encoding of that string. Some languages (and more so, some runtimes) favour particular encodings, and some programmers are used to making assumptions about that encoding. Worse yet, some spec's can make assumptions about encodings. This can be a cause of bugs where two different implementations will produce different MD5 hashes for the same string. This is especially so in cases where characters are outside of the range U+0020 to U+007F (and since U+007F is a control, that one has its own issues).

All this applies to other cryptographic hashes, such as the SHA- family of hashes.

like image 43
Jon Hanna Avatar answered Sep 19 '22 17:09

Jon Hanna