I am reading about cuckoo hashing from Pagh and Rodle and I can't understand the meaning of this paragraph:
It may happen that this process loops, as shown in Fig. 1(b). Therefore the number of iterations is bounded by a value “MaxLoop” to be specified in Section 2.3. If this number of iterations is reached, we rehash the keys in the tables using new hash functions, and try once again to accommodate the nestless key. There is no need to allocate new tables for the rehashing: We may simply run through the tables to delete and perform the usual insertion procedure on all keys found not to be at their intended position in the table.
What does it mean by using new hash functions?
In the insert algorithm the table is resized. Are we supposed to have a "pool" of hash functions to use somehow? How do we create this pool?
Yes, they're expecting new hash functions, just like they say. Fortunately, they don't require a pile of new algorithms, just slightly different hashing behavior on your current data set.
Take a look at section 2.1 of the paper, and then Appendix A. It discusses the construction of random universal hash functions.
A simple, hopefully illustrative example, is to suppose you've got some normal hash function you like, that operates on blocks of bytes. We'll call it H(x)
. You want to use it to produce a family of new, slightly different hash functions H_n(x)
. Well, if H(x)
is good, and your requirements are weak, you can just define H_n(x) = H(concat(n,x))
. You don't have nice strong guarantees about the behaviors of H_n(x)
, but you'd expect most of them to be different.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With