Just for practice (and not as a homework assignment) I have been trying to solve this problem (CLRS, 3rd edition, exercise 11.2-6):
Suppose we have stored n keys in a hash table of size m, with collisions resolved by chaining, and that we know the length of each chain, including the length L of the longest chain. Describe a procedure that selects a key uniformly at random from among the keys in the hash table and returns it in expected time O(L * (1 + m/n)).
What I thought so far is that the probability of each key being returned is 1/n. If we try to get a random value x between 1 to n, and try to find the xth key in sequence first sorted by bucket then along the chain in the bucket, then it will take O(m) to find the right bucket by going through buckets one by one and O(L) time to get the right key in chain.
Open-addressing is usually faster than chained hashing when the load factor is low because you don't have to follow pointers between list nodes.
The worst-case behavior of hashing with chaining is terrible: all n keys hash to the same slot, creating a list of length n. The worst-case time for searching is thus (n) plus the time to compute the hash function--no better than if we used one linked list for all the elements.
Chaining is a technique used for avoiding collisions in hash tables. A collision occurs when two keys are hashed to the same index in a hash table. Collisions are a problem because every slot in a hash table is supposed to store a single element.
In hashing, all the above operations can be performed in O(1) i.e. constant time. It is important to understand that the worst case time complexity for hashing remains O(n) but the average case time complexity is O(1).
Repeat the following steps until an element is returned:
k
be the number of elements in the bucket.p
uniformly at random from 1 ... L
. If p <= k
then return the p
th element in the bucket.It should be clear that the procedure returns an element uniformly at random. We are sort of applying rejection sampling to the elements placed in a 2D array.
The expected number of elements per bucket is n / m
. The probability that the sampling attempt succeeds is (n / m) / L
. The expected number of attempts needed to find a bucket is therefore L * m / n
. Together with the O(L)
cost of retrieving the element from the bucket, the expected running time is O(L * (1 + m / n))
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With