Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Good Hash Function for Strings

I'm trying to think up a good hash function for strings. And I was thinking it might be a good idea to sum up the unicode values for the first five characters in the string (assuming it has five, otherwise stop where it ends). Would that be a good idea, or is it a bad one?

I am doing this in Java, but I wouldn't imagine that would make much of a difference.

like image 876
Leif Andersen Avatar asked Apr 12 '10 17:04

Leif Andersen


People also ask

Which string hashing is best?

If you just want to have a good hash function, and cannot wait, djb2 is one of the best string hash functions i know. it has excellent distribution and speed on many different sets of keys and table sizes. you are not likely to do better with one of the "well known" functions such as PJW, K&R[1], etc.

What is a good hash function?

Characteristics of a Good Hash Function. There are four main characteristics of a good hash function: 1) The hash value is fully determined by the data being hashed. 2) The hash function uses all the input data. 3) The hash function "uniformly" distributes the data across the entire set of possible hash values.

How do you choose a good hash function?

A good hash function should have the following properties: Efficiently computable. Should uniformly distribute the keys (Each table position equally likely for each key)


1 Answers

Usually hashes wouldn't do sums, otherwise stop and pots will have the same hash.

and you wouldn't limit it to the first n characters because otherwise house and houses would have the same hash.

Generally hashs take values and multiply it by a prime number (makes it more likely to generate unique hashes) So you could do something like:

int hash = 7; for (int i = 0; i < strlen; i++) {     hash = hash*31 + charAt(i); } 
like image 94
jonathanasdf Avatar answered Sep 18 '22 18:09

jonathanasdf