I read the Wikipedia article about md5 hashes but I still can't understand how a hash can't be "reconstituted" back to the original text.
Could someone explain to someone who knows very little about cryptography how this works? What part of the function makes it one-way?
A one-way hash function is a mathematical function that generates a fingerprint of the input, but there is no way to get back to the original input. If the input is the same then the hash is always the same, if it changes at all, even by one character the output hash is completely different.
A hash function is a versatile one-way cryptographic algorithm that maps an input of any size to a unique output of a fixed length of bits. The resulting output, which is known as a hash digest, hash value, or hash code, is the resulting unique identifier we mentioned earlier.
Hashing is a mathematical operation that is easy to perform, but extremely difficult to reverse. (The difference between hashing and encryption is that encryption can be reversed, or decrypted, using a specific key.)
A one-way hash function starts with a group of characters, called a key, which you then map onto a hash, or hash value, which is a certain length. Modern hashes are 128 bits or more; however, the hash value will be shorter than the original string of characters. The hash value is sometimes called a message digest.
Since everyone until now has simply defined what a hash function was, I will bite.
A one-way function is not just a hash function -- a function that loses information -- but a function f
for which, given an image y
("SE" or 294 in existing answers), it is difficult to find a pre-image x such that f(x)=y
.
This is why they are called one-way: you can compute an image but you can't find a pre-image for a given image.
None of the ordinary hash function proposed until now in existing answers have this property. None of them are one-way cryptographic hash functions. For instance, given "SE", you can easily pick up the input "SXXXE", an input with the property that X-encode("SXXXE")=SE.
There are no "simple" one-way functions. They have to mix their inputs so well that not only you don't recognize the input at all in the output, but you don't recognize another input either.
SHA-1 and MD5 used to be popular one-way functions but they are both nearly broken (specialist know how to create pre-images for given images, or are nearly able to do so). There is a contest underway to choose a new standard one, which will be named SHA-3.
An obvious approach to invert a one-way function would be to compute many images and keep them in a table associating to each image the pre-image that produced it. To make this impossible in practice, all one-way function have a large output, at least 64 bits but possibly much larger (up to, say, 512 bits).
EDIT: How do most cryptographic hash functions work?
Usually they have at their core a single function that does complicated transformations on a block of bits (a block cipher). The function should be nearly bijective (it shouldn't map too many sequences to the same image, because that would cause weaknesses later) but it doesn't have to be exactly bijective. And this function is iterated a fixed number of times, enough to make the input (or any possible input) impossible to recognize.
Take the example of Skein, one of the strong candidates for the SHA-3 context. Its core function is iterated 72 times. The only number of iterations for which the creators of the function know how to sometimes relate the outputs to some inputs is 25. They say it has a "safety factor" of 2.9.
Think of a really basic hash - for the input string, return the sum of the ASCII values of each character.
hash( 'abc' ) = ascii('a')+ascii('b')+ascii('c') = 97 + 98 + 99 = 294
Now, given the hash value of 294, can you tell what the original string was? Obviously not, because 'abc' and 'cba' (and countless others) give the same hash value.
Cryptographic hash functions work the same way, except that obviously the algorithm is much more complex. There are always going to be collisions, but if you know string s
hashes to h
, then it should be very difficult ("computationally infeasible") to construct another string that also hashes to h
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With