I have lots of unrelated named things that I'd like to do quick searches against. An "aardvark" is always an "aardvark" everywhere, so hashing the string and reusing the integer would work well to speed up comparisons. The entire set of names is unknown (and changes over time). What is a fast string hashing algorithm that will generate small (32 or 16) bit values and have a low collision rate?
I'd like to see an optimized implementation specific to C/C++.
SHA-1 is fastest hashing function with ~587.9 ms per 1M operations for short strings and 881.7 ms per 1M for longer strings. MD5 is 7.6% slower than SHA-1 for short strings and 1.3% for longer strings.
Take into account the following hash algorithms – CRC-32, MD5, and SHA-1. These are common hash algorithms with varying levels of collision risk.
In particular, cryptographic hash functions exhibit these three properties: They are “collision-free.” This means that no two input hashes should map to the same output hash. They can be hidden. It should be difficult to guess the input value for a hash function from its output.
If you just want to have a good hash function, and cannot wait, djb2 is one of the best string hash functions i know. it has excellent distribution and speed on many different sets of keys and table sizes. you are not likely to do better with one of the "well known" functions such as PJW, K&R[1], etc.
Murmur Hash is pretty nice.
One of the FNV variants should meet your requirements. They're fast, and produce fairly evenly distributed outputs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With