Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When generating a SHA256 / 512 hash, is there a minimum 'safe' amount of data to hash?

Tags:

hash

checksum

sha

I have heard that when creating a hash, it's possible that if small files or amounts of data are used, the resulting hash is more likely to suffer from a collision. If that is true, is there a minimum "safe" amount of data that should be used to ensure this doesn't happen?

I guess the question could also be phrased as:

What is the smallest amount of data that can be safely and securely hashed?

like image 728
PeterM Avatar asked Jan 13 '11 04:01

PeterM


People also ask

How big is an SHA-512 hash?

The hash size for the SHA512 algorithm is 512 bits.

How much data can be stored in a SHA256 hash?

There is technically a limit, but it's quite large. The padding scheme used for SHA-256 requires that the size of the input (in bits) be expressed as a 64-bit number. Therefore, the maximum size is (264-1)/8 bytes ~= 2'091'752 terabytes.

Is SHA-512 safer than SHA-256?

Due to the higher collision propability of passwords with sha-256 the use of sha-512 is more recommended. That means in fact: In case of a rainbowtable-attack the passwords hashed with sha-256 algorithm are easier to crack.

How safe is SHA-512?

The SHA1, SHA256, and SHA512 functions are no longer considered secure, either, and PBKDF2 is considered acceptable. The most secure current hash functions are BCRYPT, SCRYPT, and Argon2. In addition to the hash function, the scheme should always use a salt.


3 Answers

A hash function accepts inputs of arbitrary (or at least very high) length, and produces a fixed-length output. There are more possible inputs than possible outputs, so collisions must exist. The whole point of a secure hash function is that it is "collision resistant", which means that while collisions must mathematically exist, it is very very hard to actually compute one. Thus, there is no known collision for SHA-256 and SHA-512, and the best known methods for computing one (by doing it on purpose) are so ludicrously expensive that they will not be applied soon (the whole US federal budget for a century would buy only a ridiculously small part of the task).

So, if it cannot be realistically done on purpose, you can expect not to hit a collision out of (bad) luck.

Moreover, if you limit yourself to very short inputs, there is a chance that there is no collision at all. E.g., if you consider 12-byte inputs: there are 296 possible sequences of 12 bytes. That's huge (more than can be enumerated with today's technology). Yet, SHA-256 will map each input to a 256-bit value, i.e. values in a much wider space (of size 2256). We cannot prove it formally, but chances are that all those 296 hash values are distinct from each other. Note that this has no practical consequence: there is no measurable difference between not finding a collision because there is none, and not finding a collision because it is extremely improbable to hit one.

Just to illustrate how low risks of collision are with SHA-256: consider your risks of being mauled by a gorilla escaped from a local zoo or private owner. Unlikely? Yes, but it still may conceivably happen: it seems that a gorilla escaped from the Dallas zoo in 2004 and injured four persons; another gorilla escaped from the same zoo in 2010. Assuming that there is only one rampaging gorilla every 6 years on the whole Earth (not only in the Dallas area) and you happen to be the unlucky chap who is on his path, out of a human population of 6.5 billions, then risks of grievous-bodily-harm-by-gorilla can be estimated at about 1 in 243.7 per day. Now, take 10 thousands of PC and have them work on finding a collision for SHA-256. The chances of hitting a collision are close to 1 in 275 per day -- more than a billion less probable than the angry ape thing. The conclusion is that if you fear SHA-256 collisions but do not keep with you a loaded shotgun at all times, then you are getting your priorities wrong. Also, do not mess with Texas.

like image 138
Thomas Pornin Avatar answered Oct 26 '22 15:10

Thomas Pornin


There is no minimum input size. SHA-256 algorithm is effectively a random mapping and collision probability doesn't depend on input length. Even a 1 bit input is 'safe'.

Note that the input is padded to a multiple of 512 bits (64 bytes) for SHA-256 (multiple of 1024 for SHA-512). Taking a 12 byte input (as Thomas used in his example), when using SHA-256, there are 2^96 possible sequences of length 64 bytes.

As an example, a 12 byte input Hello There! (0x48656c6c6f20546865726521) will be padded with a one bit, followed by 351 zero bits followed by the 64 bit representation of the length of the input in bits which is 0x0000000000000060 to form a 512 bit padded message. This 512 bit message is used as the input for computing the hash.

More details can be found in RFC: 4634 "US Secure Hash Algorithms (SHA and HMAC-SHA)", http://www.ietf.org/rfc/rfc4634.txt

like image 45
Babu Srinivasan Avatar answered Oct 26 '22 15:10

Babu Srinivasan


No, message length does not effect the likeliness of a collision.

If that were the case, the algorithm is broken.

You can try for yourself by running SHA against all one-byte inputs, then against all two-byte inputs and so on, and see if you get a collision. Probably not, because no one has ever found a collision for SHA-256 or SHA-512 (or at least they kept it a secret from Wikipedia)

like image 31
Thilo Avatar answered Oct 26 '22 14:10

Thilo