Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do cryptographic hash functions reach each possible values, i.e., are they surjective?

Take a commonly used binary hash function - for example, SHA-256. As the name implies, it outputs a 256 bit value.

Let A be the set of all possible 256 bit binary values. A is extremely large, but finite.

Let B be the set of all possible binary values. B is infinite.

Let C be the set of values obtained by running SHA-256 on every member of B. Obviously this can't be done in practice, but I'm guessing we can still do mathematical analysis of it.

My Question: By necessity, CA. But does C = A?

EDIT: As was pointed out by some answers, this is wholly dependent on the has function in question. So, if you know the answer for any particular hash function, please say so!

like image 998
levand Avatar asked Apr 17 '10 14:04

levand


People also ask

Are hash functions Surjective?

Hash functions are unidirectional, a surjective function that does not allow reverse, so it is easy to calculate the hash value from any input of arbitrary length. In contrast, it is harder to return to the previous value of the message from a given hash value.

Are cryptographic hash functions Bijective?

A hash function is any function that can be used to map data of arbitrary size to data of fixed size. So by definition, a hash function cannot be bijective, because its domain is infinite, while its range is finite.

What are the 3 properties of a cryptographic hash function?

A function that maps a bit string of arbitrary length to a fixed length bit string and is expected to have the following three properties: 1) Collision resistance (see Collision resistance), 2) Preimage resistance (see Preimage resistance) and 3) Second preimage resistance (see Second preimage resistance).

What are two properties of a cryptographic hash function?

What are two properties of a cryptographic hash function? (Choose two.) The hash function is one way and irreversible. The input for a particular hash algorithm has to have a fixed size. Hash functions can be duplicated for authentication purposes.


2 Answers

First, let's point out that SHA-256 does not accept all possible binary strings as input. As defined by FIPS 180-3, SHA-256 accepts as input any sequence of bits of length lower than 2^64 bits (i.e. no more than 18446744073709551615 bits). This is very common; all hash functions are somehow limited in formal input length. One reason is that the notion of security is defined with regards to computational cost; there is a threshold about computational power that any attacker may muster. Inputs beyond a given length would require more than that maximum computational power to simply evaluate the function. In brief, cryptographers are very wary of infinites, because infinites tend to prevent security from being even defined, let alone quantified. So your input set C should be restricted to sequences up to 2^64-1 bits.

That being said, let's see what is known about hash function surjectivity.

Hash functions try to emulate a random oracle, a conceptual object which selects outputs at random under the only constraint that it "remembers" previous inputs and outputs, and, if given an already seen input, it returns the same output than previously. By definition, a random oracle can be proven surjective only by trying inputs and exhausting the output space. If the output has size n bits, then it is expected that about 2^(2n) distinct inputs will be needed to exhaust the output space of size 2^n. For n = 256, this means that hashing about 2^512 messages (e.g. all messages of 512 bits) ought to be enough (on average). SHA-256 accepts inputs very much longer than 512 bits (indeed, it accepts inputs up to 18446744073709551615 bits), so it seems highly plausible that SHA-256 is surjective.

However, it has not been proven that SHA-256 is surjective, and that is expected. As shown above, a surjectivity proof for a random oracle requires an awful lot of computing power, substantially more than mere attacks such as preimages (2^n) and collisions (2^(n/2)). Consequently, a good hash function "should not" allow a property such as surjectivity to be actually proven. It would be very suspicious: security of hash function stems from the intractability of their internal structure, and such an intractability should firmly oppose to any attempt at mathematical analysis.

As a consequence, surjectivity is not formally proven for any decent hash function, and not even for "broken" hash functions such as MD4. It is only "highly suspected" (a random oracle with inputs much longer than the output should be surjective).

like image 185
Thomas Pornin Avatar answered Sep 22 '22 23:09

Thomas Pornin


Not necessarily. The pigeonhole principle states that once one more hash beyond the size of A has been generated that there is a probability of collision of 1, but it does not state that every single element of A has been generated.

like image 41
Ignacio Vazquez-Abrams Avatar answered Sep 21 '22 23:09

Ignacio Vazquez-Abrams