I'm trying to think up a good hash function for strings. And I was thinking it might be a good idea to sum up the unicode values for the first five characters in the string (assuming it has five, otherwise stop where it ends). Would that be a good idea, or is it a bad one?
I am doing this in Java, but I wouldn't imagine that would make much of a difference.
If you just want to have a good hash function, and cannot wait, djb2 is one of the best string hash functions i know. it has excellent distribution and speed on many different sets of keys and table sizes. you are not likely to do better with one of the "well known" functions such as PJW, K&R[1], etc.
Characteristics of a Good Hash Function. There are four main characteristics of a good hash function: 1) The hash value is fully determined by the data being hashed. 2) The hash function uses all the input data. 3) The hash function "uniformly" distributes the data across the entire set of possible hash values.
A good hash function should have the following properties: Efficiently computable. Should uniformly distribute the keys (Each table position equally likely for each key)
Usually hashes wouldn't do sums, otherwise stop
and pots
will have the same hash.
and you wouldn't limit it to the first n characters because otherwise house and houses would have the same hash.
Generally hashs take values and multiply it by a prime number (makes it more likely to generate unique hashes) So you could do something like:
int hash = 7; for (int i = 0; i < strlen; i++) { hash = hash*31 + charAt(i); }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With