I'm working on design of Content-addressable storage, so I'm looking for a hash function to generate object identifiers. Every object should get short ID based on its content in that way: object_id = hash(object_content)
.
Prerequisites:
32
bytes in order to address 256^32
objects at max (but this requirement may be relaxed).Taking into account these requirements, I picked up SHA256
hash, but unfortunately it's not fast enough for my purposes. The fastest implementations of SHA256
that I was able to benchmark were openssl
and boringssl
: on my desktop Intel Core I5 6400
it gave about 420 MB/s
per core. Other implementations (like crypto/rsa
in Go) are even slower. I would like to replace SHA256
with other hash function that provides the same collision guarantees as SHA256
, but gives betters throughput (at least 600 MB/s
per core).
Please share your opinion about possible options to solve this problem.
Also I would like to note that hardware update (like purchasing modern CPU with AVX512
instruction set) is not possible. The main point is to find hash function that will provide better performance on commodity hardware.
Check out Cityhash and HighwayHash. Both have 256-bit variants, and much faster than SHA256. Cityhash is faster, but it is a non-cryptographic hash. HighwayHash is slower (but still faster than SHA256), and a secure hash.
All modern non-cryptographic hashes are much faster than SHA256. If you're willing to use a 128-bit hash, you'll have more options.
Note, that you may want to consider using a 128-bit hash, as it may be adequate for your purpose. For example, if you have 1010 different objects, the probability that you have a collision with a quality 128-bit hash is less than 10-18. Check out the table here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With