Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How would you go about designing a function for a perfect hash?

The domain of interest is string matching. Assume I have a structure like this.

typedef struct
{
    char *name,
    int (*function)();

} StringArray

StringArray s[] = 
{
    {"George", func1},
    {"Paul",   func2},
    {"Ringo",  func3},
    {"John",   func4},
    {"",       NULL}   /* End of list */ 
}

There are a fixed number of strings in the array. They are hard coded as in the example. If the table changes, there would be a need to re-evaluate the quality of the hash function.

I want to apply a hash function to a string, and if the string matches one in the array, then call the function. A perfect hash function is needed for this. No collisions are allowed.The purpose of requiring hashing is to get O(1) performance on the lookup.

What ideas do you have on designing a function to do this?

like image 819
EvilTeach Avatar asked Apr 09 '09 15:04

EvilTeach


1 Answers

The summary lists both C and C++. Which of them are you looking for? C and C++ are two distinct languages, and differ greatly in their string handling and data structures (and the fact that the C ones work in C++ doesn't change that).

Why, specifically, do you want a perfect hash function? Is it that you want to associate a string with a function, and thought that would be a good way to do it? Is this some sort of homework assignment? Do you have a reason not to use map<> in C++? (Or unordered_map<> if available?)

If you do need a perfect hash, what are the constraints on the strings? Will there be a certain fixed set you want to dispatch on? What about strings that don't match one of the set? Are you willing to accept hits from random strings, or is the number of incoming strings limited?

If you could edit your question to include information like that, we could be a lot more helpful.

EDIT (in response to the first two comments):

OK, we should look at C solutions, since you presumably want this for both your C and C++ work. You presumably want the performance, but have you tested? If we're dealing with strings coming in on the I/O system, the time there is likely to dwarf the dispatch time.

You are expecting arbitrary strings. It's a bit much to expect a perfect hash function that will avoid all collisions from random data, so you do need to consider that.

Have you considered a trie? It may be more efficient than a perfect hash function (or may not be), it should be fairly easy to implement in C, and it will avoid problems with redoing your list of dispatching strings or possible collisions.

like image 170
David Thornley Avatar answered Oct 18 '22 16:10

David Thornley