The following demonstration was taken from the Algorithms: Design and Analysis video lecture on Coursera provided by Tim Roughgarden (phenomenal explanations btw). From what I understand, this hash function effectively returns an existing index position between 0 to n in the hash table. However, I am confounded by how this unique computation can be reverse engineered for constant time look-ups when a (below) is a randomized integer.
Let U = IP addresses (of the form (x1, x2, x3, x4)
with each subscript of x in {0,1,2,...255})
Let n = a prime number
Define one hash function per 4-tuple a = (a1, a2, a3, a4)
with each subscript of a in {0, 1, 2,...n-1}
Define: h<sub>a : IP addresses ---> buckets *n^4 such functions*
by h<sub>a(x1,x2,x3,x4) = (a1x1 + a2x2 + a3x3 + a4x4)
<sub> = subscript notation
The above equation is then computed with modulo n to return a position within the hash table that has a 1/n probability chance of recurrence for any different inputs within U. How do I retrieve that IP address location with the original IP address?
Hashes lose information. So you cannot work backwards from the hash to the original. You store the original at the index given by its hash; you can verify that some value x is in the table by seeing whether the item at H(x) is x.
Of course, that won't work if two objects in the table have the same hash (a "collision"). Different hash-table algorithms have different strategies to deal with collisions, and for the strategy your lecturer is probably explaining, it will be useful to have another independent hash function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With