Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Analyzing goals and choosing a good hash function

Tags:

hashtable

hash

This isn’t a specific question with a specific solution; but it’s rather a response to the fact that I can’t find any good Stack Overflow qestions about how to choose a good a hashing function for hash tables and similar tasks.

So! Let’s talk hash functions, and how to choose one. How should a programming noob, who needs to choose a good hash function for their specific task, go about choosing one? When is the simple and quick Fowler-Noll-Vo appropriate? When should they vendor in MurmurHash3 instead? Do you have any links to good resources on comparing the various options?

like image 871
ELLIOTTCABLE Avatar asked Sep 04 '11 18:09

ELLIOTTCABLE


2 Answers

The hash function for hash tables should have these two properties

  • Uniformity all outputs of H() should be evenly distributed as much as possible. In other words the for 32-bit hash function the probability for every output should be equal to 1/2^32. (for n-bit it should be 1/2^n). With uniform hash function the chance of collision is minimized to lowest possible for any possible input.
  • Low computational cost Hash functions for tables are expected to be FAST, compared to cryptographic hash functions where speed is traded for preimage resistance (eg it is hard to find the message from given hash value) and collision resistance.

For purposes of hash tables all cryptographic functions are BAD choice, since the computational cost is enormous. Because hashing here is used not for security but for fast access. MurmurHash is considered one of the fastest and uniform functions suitable for big hash tables or hash indexes. For small tables a trivial hash function should be OK. A trivial hash is where we mix values of object (by multiplication, addition and subtraction with some prime).

like image 92
Andreas Bakurov Avatar answered Sep 18 '22 23:09

Andreas Bakurov


If your hash keys are strings (or other variable-length data) you might look at this paper by Ramakrishna and Zobel. They benchmark a few classes of hashing functions (for speed and low collisions) and exhibit a class that is better than the usual Bernstein hashes.

like image 34
phs Avatar answered Sep 19 '22 23:09

phs