Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash from object's content as object ID: fast alternatives for SHA256

I'm working on design of Content-addressable storage, so I'm looking for a hash function to generate object identifiers. Every object should get short ID based on its content in that way: object_id = hash(object_content).

Prerequisites:

  1. Hash-function should be fast.
  2. Collision probability must be as low as possible.
  3. Optimal ID length is 32 bytes in order to address 256^32 objects at max (but this requirement may be relaxed).

Taking into account these requirements, I picked up SHA256 hash, but unfortunately it's not fast enough for my purposes. The fastest implementations of SHA256 that I was able to benchmark were openssl and boringssl: on my desktop Intel Core I5 6400 it gave about 420 MB/s per core. Other implementations (like crypto/rsa in Go) are even slower. I would like to replace SHA256 with other hash function that provides the same collision guarantees as SHA256, but gives betters throughput (at least 600 MB/s per core).

Please share your opinion about possible options to solve this problem.

Also I would like to note that hardware update (like purchasing modern CPU with AVX512 instruction set) is not possible. The main point is to find hash function that will provide better performance on commodity hardware.

like image 813
Vitaly Isaev Avatar asked Oct 16 '22 11:10

Vitaly Isaev


1 Answers

Check out Cityhash and HighwayHash. Both have 256-bit variants, and much faster than SHA256. Cityhash is faster, but it is a non-cryptographic hash. HighwayHash is slower (but still faster than SHA256), and a secure hash.

All modern non-cryptographic hashes are much faster than SHA256. If you're willing to use a 128-bit hash, you'll have more options.

Note, that you may want to consider using a 128-bit hash, as it may be adequate for your purpose. For example, if you have 1010 different objects, the probability that you have a collision with a quality 128-bit hash is less than 10-18. Check out the table here.

like image 85
geza Avatar answered Oct 21 '22 07:10

geza