What are the important points about cryptographic hash functions?

Tags:

I was reading this question on MD5 hash values and the accepted answer confuses me. One of the main properties, as I understand it, of a cryptopgraphic hash function is that it is infeasible to find two different messages (inputs) with the same hash value.

Yet the consensus answer to the question Why aren't MD5 hash values reversible? is Because an infinite number of input strings will generate the same output. This seems completely contradictory to me.

Also, what perplexes me somewhat is the fact that the algorithms are public, yet the hash values are still irreversible. Is this because there is always data loss in a hash function so there's no way to tell which data was thrown away?

What happens when the input data size is smaller than the fixed output data size (e.g., hashing a password "abc")?

EDIT:

OK, let me see if I have this straight:

It is really, really hard to infer the input from the hash because there are an infinite amount of input strings that will generate the same output (irreversible property).
However, finding even a single instance of multiple input strings that generate the same output is also really, really hard (collision resistant property).

480

asked Jun 24 '09 13:06

Rob Sobers

2 Answers

Warning: Long answer

I think all of these answers are missing a very important property of cryptographic hash functions: Not only is it impossible to compute the original message that was hashed to get a given hash, it's impossible to compute any message that would hash to a given hash value. This is called preimage resistance.

(By "impossible" - I mean that no one knows how to do it in less time than it takes to guess every possible message until you guess the one that was hashed into your hash.)

(Despite popular belief in the insecurity of MD5, MD5 is still preimage resistant. Anyone who doesn't believe me is free to give me anything that hashes to 2aaddf751bff2121cc51dc709e866f19. What MD5 doesn't have is collision resistance, which is something else entirely.)

Now, if the only reason you can't "work backwards" in a cryptographic hash function was because the hash function discards data to create the hash, then it would not guarantee preimage resistance: You can still "work backwards", and just insert random data wherever the hash function discards data, and while you wouldn't come up with the original message, you'd still come up with a message that hashes to the desired hash value. But you can't.

So the question becomes: Why not? (Or, in other words, how do you make a function preimage resistant?)

The answer is that cryptographic hash functions simulate chaotic systems. They take your message, break it into blocks, mix those blocks around, have some of the blocks interact with each other, mix those blocks around, and repeat that a lot of times (well, one cryptographic hash function does that; others have their own methods). Since the blocks interact with each other, block C not only has to interact with block D to produce block A, but it has to interact with block E to produce block B. Now, sure, you can find values of blocks C, D, E that would produce the blocks A and B in your hash value, but as you go further back, suddenly you need a block F that interacts with C to make D, and with E to make B, and no such block can do both at the same time! You must have guessed wrong values for C, D, and E.

While not all cryptographic hash functions are exactly as described above with block interaction, they have the same idea: That if you try to "work backwards", you're going to end up with a whole lot of dead ends, and the time it takes for you to try enough values to generate a preimage is on the order of hundreds to millions of years (depending on the hash function), not much better than the time it would take just to try messages until you find one that works.

185

answered Sep 28 '22 05:09

Zarel

1: The primary purpose of a hash is to map a very, very large space to a smaller but still very large space (e.g., MD5, which will take 'anything' and convert it into a space of size 2^128 -- big, but not nearly as big as aleph-0.)

In addition to other features, good hashes fill the destination space homogeneously. Bad hashes fill the space in a clumpy fashion, coming up with the same hash for many common inputs.

Imagine the idiotic hash function sum(), which just adds all the digits of the input number: it succeeds in mapping down, but there are a bunch of collisions (inputs with the same output, like 3 and 12 and 21) at the low end of the output space and the upper end of the space is nearly empty. As a result it makes very poor use of the space, is easy to crack, etc.

So a good hash that makes even use of the destination space will make it difficult to find two inputs with the same output, just by the odds: if MD5 were perfect, the odds that two inputs would have the same output would be 2^-128. That's pretty decent odds: the best you can do without resorting to a larger output space. (In truth MD5 isn't perfect, which is one of the things that makes it vulnerable.)

But it will still be true that a huge number of inputs will map to any given hash, because the input space is 'infinite', and dividing infinity by 2^128 still gives you infinity.

2: Yes, hashes always cause data loss, except in the case where your output space is the same as, or larger than, your input space -- and in that case you probably didn't need to hash!

3: For smaller inputs, best practice is to salt the input. Actually, that's good practice for any cryptographic hashing, because otherwise an attacker can feed you specific inputs and try to figure out which hash you are using. 'Salt' is just a set of additional information that you append (or prepend) to your input; you then hash the result.

edit: In cryptography, it is also important that the hash function is resistant to preimage attacks, intuitively, that is hard to guess the input for a given output even knowing many other input/output pairs. The "sum" function could probably be guessed rather easily (but since it destroys data still might not be easy to reverse).

answered Sep 28 '22 04:09

Alex Feinman

Related questions
                            
                                How to check password manually in Asp.Net identity 2?
                            
                                GitHub source dynamic navigation
                            
                                Generate ASP.Net Membership password hash in pure T-SQL
                            
                                How to initialize a hash with keys from an array?
                            
                                Best algorithm for hashing number values?
                            
                                Unable to get a sha256 hash of a string [duplicate]
                            
                                SQL Server 2008 and HashBytes
                            
                                What is the space complexity of a hash table?
                            
                                Fastest way to compare directory state, or hashing for fun and profit
                            
                                How to replace Django's primary key with a different integer that is unique for that table
                            
                                Perl hash Data::Dumper
                            
                                C# MD5 hasher example
                            
                                Using two (or more) objects as a HashMap key
                            
                                hash each line in text file
                            
                                C# Threading - Reading and hashing multiple files concurrently, easiest method?
                            
                                password_compat for older php version
                            
                                Insert element at the beginning of Ruby Hash?
                            
                                Is perl's each function worth using?
                            
                                Working with single page websites and maintaining state using a URL hash and jQuery
                            
                                generating unique ids in hive

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What are the important points about cryptographic hash functions?

Tags:

cryptography

hash

hash-function

Rob Sobers

People also ask

2 Answers

Zarel

Alex Feinman

Recent Activity

Donate For Us