I'm trying to choose a hash algorithm for comparing about max 20 different text data.
Which hash is better for these requirements?
I'm using hash for less memory footprint and comparison performance
SHA-1 is fastest hashing function with ~587.9 ms per 1M operations for short strings and 881.7 ms per 1M for longer strings. MD5 is 7.6% slower than SHA-1 for short strings and 1.3% for longer strings. SHA-256 is 15.5% slower than SHA-1 for short strings and 23.4% for longer strings.
MD5 is known to be generally faster than SHA256 .
Save this answer. Show activity on this post. If you want to check if two files are the same, CRC32 checksum is the way to go because it's faster than MD5. But be careful: CRC only reliably tells you if the binaries are different; it doesn't tell you if they're identical.
SHA-256: This hashing algorithm is a variant of the SHA2 hashing algorithm, recommended and approved by the National Institute of Standards and Technology (NIST). It generates a 256-bit hash value. Even if it's 30% slower than the previous algorithms, it's more complicated, thus, it's more secure.
If collision is not a big deal you can take the first letter of each document. Or you can use the length of the text or the string with the text.
Paul Hsieh has a decent, simple, fast, 32-bit SuperFastHash that performs better than most existing hash functions, is easier to understand/implement, and sounds like it meets your criteria.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With