I'm writing an application that uses hashing to speed up file comparisons. Basically I pre-hash file A, and then the app runs and matches files in a folder with previously hashed files. My current criteria for looking for a hash function are as follows:
So what's a good algorithm to use here, I'm using C# but I'm sure most algorithms are available on any platform. Like I said, I'm using SHA-256, but I'm sure there's something better.
Yes they are called Perfect hash functions on wiki iv also seen them being called collision free hash functions.
An example of a cryptographic hash function is SHA256. An example of a non-cryptographic hash function is CRC32.
I believe bcrypt is the slowest hashing algorithm currently available and is why it is most commonly recommended for hashing passwords.
Yann Collet's xxHash may be a good choice (Home page, GitHub)
xxHash is an extremely fast non-cryptographic hash algorithm, working at speeds close to RAM limits. It is proposed in two flavors, 32 and 64 bits.
At least 4 C# impelmentations are available (see home page).
I had excellent results with it in the past.
The Hash size is 32 or 64 bit, but XXH3 is in the making:
XXH3 features a wide internal state of 512 bits, which makes it suitable to generate a hash of up to 256 bit. For the time being, only 64-bit and 128-bit variants are exposed, but a similar recipe can be used for a 256-bit variant if there is any need for it one day. All variant feature same speed, since only the finalization stage is different.
In general, the longer the hash, the slower its calculation. 64-bit hash is good enough for most practical purposes.
You can generate longer hashes by combining two hash functions (e.g. 128-bit XXH3 and 128-bit MurmurHash3).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With