many iterations on a hash: doesn't it reduces entropy?

Tags:

I see this technique recommended in many places (including stack), and i can't get out of my head that this would reduce entropy! After all, you are hashing something again, that has already been hashed and has a collision chance. Wouldn't collision chance over collision chance results in more collision chances? After researching, it seems I'm wrong, but why?

566

asked Apr 04 '12 12:04

Hugo Mota

1 Answers

Since you tagged md5, I'll use that as an example. From wikipedia:

if two prefixes with the same hash can be constructed, a common suffix can be added to both to make the collision more likely to be accepted as valid data by the application using it. Furthermore, current collision-finding techniques allow to specify an arbitrary prefix: an attacker can create two colliding files that both begin with the same content. All the attacker needs to generate two colliding files is a template file with a 128-byte block of data, aligned on a 64-byte boundary that can be changed freely by the collision-finding algorithm. An example MD5 collision, with the two messages differing in 6 bits, is:

And then the example plaintexts they give are 256 bytes long. Since the collision attack relies on a 128 byte block of data, and the hash digest is only 128 bits, there really isn't an increased risk of a collision attack succeeding beyond the first iteration - that is to say that you can't really influence the likelihood of a collision beyond the first hash.

Also consider that the entropy of the hash is the aforementioned 128 bits. Even considering that the total collision chance is only 2^20.96 (again from wikipedia), it would take a great number of iterations to cause two inputs to collide. The first-glance reasoning that I think you're falling victim to is:

Any two arbitrary inputs have a chance of collision x%.
The outputs of the first hash are two such inputs themselves.
Therefore, every iteration increases the chance of collision by x%.

This can be disproven by counterexample fairly easily. Consider again MD5:

The chance for collision of two inputs is 1:2^21 (taking the worst case scenario from wikipedia's cryptography analysis of MD5)
Hashing again causes an equally likely chance of collision to compound, therefore the chance of collision at the second round is 1:2^20
Therfore, for any two inputs hashed a number of times equal to the entropy of the digest are guaranteed to collide.

MD5 any two inputs 128 times in a row and you will see that this is not true. You probably won't find a single repeated hash between them - after all, you've only created 256 out of a possible 2^128 hash values, leaving 2^120 possibilities. The probabilities of collisions between each round is independent of all other rounds.

There are two approaches to understand why this is so. The first is that each iteration is essentially trying to hit a moving target. I think you could construct a proof based on the birthday paradox that there is a surprisingly low number of iterations of hashing where you will likely see one hash digest from one input match the hash digest of a different input. But they would almost certainly have occurred at different steps of the iteration. And once that occurs, they can never have the same output on the same iteration because the hash algorithm itself is deterministic.

The other approach is to realize that the hash function actually adds entropy while it runs. Consider that an empty string has a 128 bit digest just like any other input; that cannot occur without entropy being added during the algorithm steps. This is actually a necessarily part of a cryptographic hash function: data must be destroyed or else the input could be recovered from the digest. For inputs longer than the digest, yes, entropy is lost on the whole; it has to be in order to fit into the digest length. But some entropy is also added.

I don't have as exact numbers for other hash algorithms, but I think all the points I've made generalize to other hash functions and one-way / mapping functions.

183

answered Nov 09 '22 09:11

Patrick M

Related questions
                            
                                Maximal SHA-1 Hash Performance Tips in Java
                            
                                how much is it worth of hashing of passwords in java for security?
                            
                                How can I return to the current hash location using Passport.js?
                            
                                IOS 5: How to return YES on contains: with 2 'identical' objects, but with different pointers
                            
                                Hash code non-zero initial value - note: I am not asking about primes
                            
                                Optimally reordering cards in a wallet?
                            
                                can't use an undefined value as a symbol perl
                            
                                'Isomorphic' comparison of NetworkX Graph objects instead of the default 'address' comparison
                            
                                implementing PBEKeySpec encryption into IOS
                            
                                How to sort hash in rails
                            
                                Efficient re-hashing of a rope
                            
                                Programming a probability of twins reunion
                            
                                PHP Password storage with HMAC+nonce - Is nonce randomness important?
                            
                                How do I hash two blocks of data with HashAlgorithm in C#?
                            
                                How can I do an ISO 9797-1 MAC with triple DES in C#?
                            
                                RewriteRule and the number sign “#”
                            
                                How to create a hash that is similar for similar input?
                            
                                Objective-C SHA2 hash not working correctly with non-ASCII
                            
                                remove fragments in a sentence [puzzle]
                            
                                Python dictionary keys(which are class objects) comparison with multiple comparer

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

many iterations on a hash: doesn't it reduces entropy?

Tags:

iteration

hash

md5

sha1

Hugo Mota

People also ask

1 Answers

Patrick M

Recent Activity

Donate For Us