On a new system, we require a one-way-hash to compute a digital signature from binary input (e.g., a kilobyte of text, or larger text-and-binary files). The need is similar to how Scons (build system) hashes command-lines and source files, and how Git (version control system) hashes files to compute a signature for storage/synchronization.
Recall that Scons uses MD5, and Git uses SHA-1.
While MD5 and SHA-1 have been "broken", neither Scons nor Git are using their hashes specifically for security (e.g., it's not to store passwords), so general practice still considers those algorithms acceptable for that usage. (Of course, this is partially a rationalization due to legacy adoption.)
QUESTION: Would you use SHA256 (not MD5 nor SHA-1) for a (non-crypto/security) one-way hash in a new system?
The concerns are:
I'd be particularly interested in an answer consistent with the Scons or Git communities saying, "We'll keep ours forever!" or "We want to move to a new hash as soon as practical!" (I'm not sure what their plans are?)
SHA256 algorithm generates an almost-unique, fixed size 256-bit (32-byte) hash. Hash is so called a one way function. This makes it suitable for checking integrity of your data, challenge hash authentication, anti-tamper, digital signatures, blockchain.
The SHA-256 algorithm returns hash value of 256-bits, or 64 hexadecimal digits. While not quite perfect, current research indicates it is considerably more secure than either MD5 or SHA-1. Performance-wise, a SHA-256 hash is about 20-30% slower to calculate than either MD5 or SHA-1 hashes.
MD5, SHA-1, and SHA-256 are all different hash functions. Software creators often take a file download—like a Linux . iso file, or even a Windows .exe file—and run it through a hash function. They then offer an official list of the hashes on their websites.
As SHA1 has been deprecated due to its security vulnerabilities, it is important to ensure you are no longer using an SSL certificate which is signed using SHA1. All major SSL certificate issuers now use SHA256 which is more secure and trustworthy.
Yes, I would use SHA-256. SHA-256 had a lot more than security purposes in mind; in fact one of the reasons that SHA1 needed to be replaced was for the very reason you need a hash function. A hash algorithm produces a finite site output; while having an undetermined amount of input. Eventually there will be a collision. The larger the output; the less likely of a collision (when using a proper hash algorithm).
Git went with SHA1 because they use it as file names; and they wanted it to be small and compact. SHA256 produces a much larger digest; consuming more disk space and more data to transmit over the wire. This question specifically addresses what would happen if git were to encounter collisions.
To look at your points:
The probability of a non-malicious collision is vanishingly small, even with MD5. Here is a thought experiment:
A well stuffed hard drive may have 1M files. For the experiment, imagine there are 10M files. Let's say that the world population is 10.000M persons, each with one computer, and every file is different.
We would be contending with a number of different files of 10E6 * 10E9 = 1E17, < 2^57
The probability of an MD5 collision in such a far fetched case would be less than 1 in 2^71, or less than one in aproximately 2E21! To put this in perspective, for a collision probability of 1 in 10M we would have to repeat the experiment roughly 2E14 times, which is to say replacing every file, every hour since the big bang, and then keep going for a few more billion years.
2E14 / 24 / 365 / 13500E6 = 1.69
Of course, with SHA1 or SHA256, the probabilities would be even smaller, and there would also be resistance to a malicious attack -- unlike MD5, it would not be possible (now) that someone constructed files purposely for having the same hash.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With