While reading the pigeonhole principle on Wikipedia, I come across - "collisions are inevitable in a hash table because the number of possible keys exceeds the number of indices in the array. No hashing algorithm, no matter how clever, can avoid these collisions". But isn't gperf doing this exactly?
Please enlighten.
Perfect hashing is defined as a model of hashing in which any set of n elements can be stored in a hash table of equal size and can have lookups performed in constant time.
Definition: A hash function that maps each different key to a distinct integer. Usually all possible keys must be known beforehand. A hash table that uses a perfect hash has no collisions. Formal Definition: A function f is perfect for a set of keys K iff ∀ j, k ∈ K f(j) = f(k) → j = k.
A perfect hash function maps every element to a distinct integer with no collisions — in mathematical terms it's an injective function. A minimal perfect hash (mph) function maps n keys to the n consecutive integers [0, n-1].
gperf
creates the hash function and the hash table basing on a predefined list of strings.
It therefore appears to me that gperf
creates the hashes long enough so that there are enough possibilities.
That's what you can do only if you know every possible key upfront - which is an assumption which didn't hold for the description in the wikipedia's entry, which was apparently related to a "non-constant" hash-table.
From the gperf's website: "For a given list of strings, it produces a hash function and hash table,..." - which means it has to know all the strings previously in order to prepare a function, that works without collisions.
Usual hash functions you are using in general programming languages are able to handle any strings as input one after another (the list is not given at once) and therefore they can produce collisions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With