Choosing a good hashing function, h(k), is essential for hash-table based searching. h should distribute the elements of our collection as uniformly as possible to the "slots" of the hash table. The key criterion is that there should be a minimum number of collisions. will provide uniform hashing.
To the time of writing, SHA-256 is still the most secure hashing algorithm out there. It has never been reverse engineered and is used by many software organizations and institutions, including the U.S. government, to protect sensitive information.
The most important property of hash functions is the size of the hash. A larger hash makes it more difficult to invert the function, and it ensures that the function is collision free. Because hash functions have a fixed output but unlimited inputs, multiple values can produce the same hash.
If you want to avoid collisions while sacrificing speed you will want cryptographic hash functions, of which MD5 (128 bits), SHA-1 (160 bits) and SHA-2 (usually SHA-256 or SHA-512) are the most widely used and have fast implementations.
In cryptography, hash functions provide three separate functions.
These properties are related but independent. For example, collision resistance implies second preimage resistance, but not the other way around. For any given application, you will have different requirements, needing one or more of these properties. A hash function for securing passwords on a server will usually only require preimage resistance, while message digests require all three.
It has been shown that MD5 is not collision resistant, however, that does not preclude its use in applications that do not require collision resistance. Indeed, MD5 is often still used in applications where the smaller key size and speed are beneficial. That said, due to its flaws, researchers recommend the use of other hash functions in new scenarios.
SHA1 has a flaw that allows collisions to be found in theoretically far less than the 2^80 steps a secure hash function of its length would require. The attack is continually being revised and currently can be done in ~2^63 steps - just barely within the current realm of computability. For this reason NIST is phasing out the use of SHA1, stating that the SHA2 family should be used after 2010.
SHA2 is a new family of hash functions created following SHA1. Currently there are no known attacks against SHA2 functions. SHA256, 384 and 512 are all part of the SHA2 family, just using different key lengths.
RIPEMD I can't comment too much on, except to note that it isn't as commonly used as the SHA families, and so has not been scrutinized as closely by cryptographic researchers. For that reason alone I would recommend the use of SHA functions over it. In the implementation you are using it seems quite slow as well, which makes it less useful.
In conclusion, there is no one best function - it all depends on what you need it for. Be mindful of the flaws with each and you will be best able to choose the right hash function for your scenario.
The pigeonhole principle says that try as hard as you will you can not fit more than 2 pigeons in 2 holes (unless you cut the pigeons up). Similarly you can not fit 2^128 + 1 numbers in 2^128 slots. All hash functions result in a hash of finite size, this means that you can always find a collision if you search through "finite size" + 1 sequences. It's just not feasible to do so. Not for MD5 and not for Skein.
All the hash functions have collisions, its a fact of life. Coming across these collisions by accident is the equivalent of winning the intergalactic lottery. That is to say, no one wins the intergalactic lottery, its just not the way the lottery works. You will not come across an accidental MD5/SHA1/SHA2XXX hash, EVER. Every word in every dictionary, in every language, hashes to a different value. Every path name, on every machine in the entire planet has a different MD5/SHA1/SHA2XXX hash. How do I know that, you may ask. Well, as I said before, no one wins the intergalactic lottery, ever.
Sometimes the fact that its broken does not matter.
As it stands there are no known pre-image or second pre-image attacks on MD5.
So what is so broken about MD5, you may ask? It is possible for a third party to generate 2 messages, one of which is EVIL and another of which is GOOD that both hash to the same value. (Collision attack)
Nonetheless, the current RSA recommendation is not to use MD5 if you need pre-image resistance. People tend to err on the side of caution when it comes to security algorithms.
Repeat this after me, there are no chance MD5 collisions, malicious collisions can be carefully engineered. Even though there are no known pre-image attacks to date on MD5 the line from the security experts is that MD5 should not be used where you need to defend against pre-image attacks. SAME goes for SHA1.
Keep in mind, not all algorithms need to defend against pre-image or collision attacks. Take the trivial case of a first pass search for duplicate files on your HD.
No one ever found any SHA512 collision. EVER. They have tried really hard. For that matter no one ever found any SHA256 or 384 collision ever. .
RIPMED has not received the same amount of scrutiny that SHAX and MD5 has received. Both SHA1 and RIPEMD are vulnerable to birthday attacks. They are both slower than MD5 on .NET and come in the awkward 20 byte size. Its pointless to use these functions, forget about them.
SHA1 collision attacks are down to 2^52, its not going to be too long until SHA1 collisions are out in the wild.
For up to date information about the various hash functions have a look at the hash function zoo.
Having a fast hash function can be a curse. For example: a very common usage for hash functions is password storage. Essentially, you calculate hash of a password combined with a known random string (to impede rainbow attacks) and store that hash in the database.
The problem is, that if an attacker gets a dump of the database, he can, quite effectively guess passwords using brute-force. Every combination he tries only takes a fraction of millisecond, and he can try out hundreds of thousands of passwords a second.
To work around this issue, the bcrypt algorithm can be used, it is designed to be slow so the attacker will be heavily slowed down if attacking a system using bcrypt. Recently scrypt has made some headline and is considered by some to be more effective than bcrypt but I do not know of a .Net implementation.
Times have changed, we have a SHA3 winner. I would recommend using keccak (aka SHA3) winner of the SHA3 contest.
In order of weakest to strongest I would say:
Personally I'd use MD6, because one can never been too paranoid. If speed is a real concern I'd look at Skein, or SHA-256.
In MD5's defense, there is no known way to produce a file with an arbitrary MD5 hash. The original author must plan in advance to have a working collision. Thus if the receiver trusts the sender, MD5 is fine. MD5 is broken if the signer is malicious, but it is not known to be vulnerable to man-in-the-middle attacks.
It would be a good ideea to take a look at the BLAKE2 algorythm.
As it is described, it is faster than MD5 and at least as secure as SHA-3. It is also implemented by several software applications, including WinRar.
Which one you use really depends on what you are using it for. If you just want to make sure that files don't get corrupted in transit and aren't that concerned about security, go for fast and small. If you need digital signatures for multi-billion dollar federal bailout agreements and need to make sure they aren't forged, go for hard to spoof and slow.
I would like to chime in (before md5 gets torn apart) that I do still use md5 extensively despite its overwhelming brokenness for a lot of crypto.
As long as you don't care to protect against collisions (you are still safe to use md5 in an hmac as well) and you do want the speed (sometimes you want a slower hash) then you can still use md5 confidently.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With