Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's a good hash function for English words?

Tags:

I have a long list of English words and I would like to hash them. What would be a good hashing function? So far my hashing function sums the ASCII values of the letters then modulo the table size. I'm looking for something efficient and simple.

like image 567
Mike G Avatar asked Oct 08 '11 23:10

Mike G


People also ask

What is a good hash function?

Characteristics of a Good Hash Function. There are four main characteristics of a good hash function: 1) The hash value is fully determined by the data being hashed. 2) The hash function uses all the input data. 3) The hash function "uniformly" distributes the data across the entire set of possible hash values.

What are 2 well known hash functions?

The most common hash functions used in digital forensics are Message Digest 5 (MD5), and Secure Hashing Algorithm (SHA) 1 and 2.

What are hash functions examples?

Hash functions (hashing algorithms) used in computer cryptography are known as "cryptographic hash functions". Examples of such functions are SHA-256 and SHA3-256, which transform arbitrary input to 256-bit output.

What is hash function in simple words?

Definition: A hash function is a function that takes a set of inputs of any arbitrary size and fits them into a table or other data structure that contains fixed-size elements.


2 Answers

To simply sum the letters is not a good strategy because a permutation gives the same result.

This one (djb2) is quite popular and works nicely with ASCII strings.

unsigned long hashstring(unsigned char *str) {     unsigned long hash = 5381;     int c;      while (c = *str++)         hash = ((hash << 5) + hash) + c; /* hash * 33 + c */      return hash; } 

More info here.

If you need more alternatives and some perfomance measures, read here.

Added: These are general hashing functions, where the input domain is not known in advance (except perhaps some very general assumptions: eg the above works slightly better with ascii input), which is the most usual scenario. If you have a known restricted domain (set of inputs fixed) you can do better, see Fionn's answer.

like image 100
leonbloy Avatar answered Sep 17 '22 18:09

leonbloy


Maybe something like this would help you: http://www.gnu.org/s/gperf/

It generates a optimized hashing function for the input domain.

like image 42
Fionn Avatar answered Sep 16 '22 18:09

Fionn