When generating a SHA256 / 512 hash, is there a minimum 'safe' amount of data to hash?

Tags:

I have heard that when creating a hash, it's possible that if small files or amounts of data are used, the resulting hash is more likely to suffer from a collision. If that is true, is there a minimum "safe" amount of data that should be used to ensure this doesn't happen?

I guess the question could also be phrased as:

What is the smallest amount of data that can be safely and securely hashed?

728

asked Jan 13 '11 04:01

PeterM

3 Answers

A hash function accepts inputs of arbitrary (or at least very high) length, and produces a fixed-length output. There are more possible inputs than possible outputs, so collisions must exist. The whole point of a secure hash function is that it is "collision resistant", which means that while collisions must mathematically exist, it is very very hard to actually compute one. Thus, there is no known collision for SHA-256 and SHA-512, and the best known methods for computing one (by doing it on purpose) are so ludicrously expensive that they will not be applied soon (the whole US federal budget for a century would buy only a ridiculously small part of the task).

So, if it cannot be realistically done on purpose, you can expect not to hit a collision out of (bad) luck.

Moreover, if you limit yourself to very short inputs, there is a chance that there is no collision at all. E.g., if you consider 12-byte inputs: there are 2⁹⁶ possible sequences of 12 bytes. That's huge (more than can be enumerated with today's technology). Yet, SHA-256 will map each input to a 256-bit value, i.e. values in a much wider space (of size 2²⁵⁶). We cannot prove it formally, but chances are that all those 2⁹⁶ hash values are distinct from each other. Note that this has no practical consequence: there is no measurable difference between not finding a collision because there is none, and not finding a collision because it is extremely improbable to hit one.

Just to illustrate how low risks of collision are with SHA-256: consider your risks of being mauled by a gorilla escaped from a local zoo or private owner. Unlikely? Yes, but it still may conceivably happen: it seems that a gorilla escaped from the Dallas zoo in 2004 and injured four persons; another gorilla escaped from the same zoo in 2010. Assuming that there is only one rampaging gorilla every 6 years on the whole Earth (not only in the Dallas area) and you happen to be the unlucky chap who is on his path, out of a human population of 6.5 billions, then risks of grievous-bodily-harm-by-gorilla can be estimated at about 1 in 2^43.7 per day. Now, take 10 thousands of PC and have them work on finding a collision for SHA-256. The chances of hitting a collision are close to 1 in 2⁷⁵ per day -- more than a billion less probable than the angry ape thing. The conclusion is that if you fear SHA-256 collisions but do not keep with you a loaded shotgun at all times, then you are getting your priorities wrong. Also, do not mess with Texas.

138

answered Oct 26 '22 15:10

Thomas Pornin

There is no minimum input size. SHA-256 algorithm is effectively a random mapping and collision probability doesn't depend on input length. Even a 1 bit input is 'safe'.

Note that the input is padded to a multiple of 512 bits (64 bytes) for SHA-256 (multiple of 1024 for SHA-512). Taking a 12 byte input (as Thomas used in his example), when using SHA-256, there are 2^96 possible sequences of length 64 bytes.

As an example, a 12 byte input Hello There! (0x48656c6c6f20546865726521) will be padded with a one bit, followed by 351 zero bits followed by the 64 bit representation of the length of the input in bits which is 0x0000000000000060 to form a 512 bit padded message. This 512 bit message is used as the input for computing the hash.

More details can be found in RFC: 4634 "US Secure Hash Algorithms (SHA and HMAC-SHA)", http://www.ietf.org/rfc/rfc4634.txt

answered Oct 26 '22 15:10

Babu Srinivasan

No, message length does not effect the likeliness of a collision.

If that were the case, the algorithm is broken.

You can try for yourself by running SHA against all one-byte inputs, then against all two-byte inputs and so on, and see if you get a collision. Probably not, because no one has ever found a collision for SHA-256 or SHA-512 (or at least they kept it a secret from Wikipedia)

answered Oct 26 '22 14:10

Thilo

Related questions
                            
                                How does HashMap make sure the index calculated using hashcode of key is within the available range?
                            
                                Python md5 hashes comparison
                            
                                What happens in Hopscotch Hash Tables when there are more than sizeof(Neighborhood) actual hash collisions?
                            
                                Technique for extending a class with private constructors
                            
                                Define std::hash<std::function>
                            
                                Does a string hash exist which can ignore the order of chars in this string
                            
                                Square's Retrofit Android: Hash With Contents of Request
                            
                                Is this "fast hash" function dangerous?
                            
                                Avoid/Detect manipulation of an exported file
                            
                                How can I replace a Perl hash key?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

When generating a SHA256 / 512 hash, is there a minimum 'safe' amount of data to hash?

Tags:

hash

checksum

sha