Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently picking a random element from a chained hash table?

Just for practice (and not as a homework assignment) I have been trying to solve this problem (CLRS, 3rd edition, exercise 11.2-6):

Suppose we have stored n keys in a hash table of size m, with collisions resolved by chaining, and that we know the length of each chain, including the length L of the longest chain. Describe a procedure that selects a key uniformly at random from among the keys in the hash table and returns it in expected time O(L * (1 + m/n)).

What I thought so far is that the probability of each key being returned is 1/n. If we try to get a random value x between 1 to n, and try to find the xth key in sequence first sorted by bucket then along the chain in the bucket, then it will take O(m) to find the right bucket by going through buckets one by one and O(L) time to get the right key in chain.

like image 390
Bicheng.Cao Avatar asked Dec 25 '11 11:12

Bicheng.Cao


People also ask

Is open addressing or chaining faster?

Open-addressing is usually faster than chained hashing when the load factor is low because you don't have to follow pointers between list nodes.

What's the worst possible search time for an element in a hash table handling collision using chaining )?

The worst-case behavior of hashing with chaining is terrible: all n keys hash to the same slot, creating a list of length n. The worst-case time for searching is thus (n) plus the time to compute the hash function--no better than if we used one linked list for all the elements.

What is chaining in hash table?

Chaining is a technique used for avoiding collisions in hash tables. A collision occurs when two keys are hashed to the same index in a hash table. Collisions are a problem because every slot in a hash table is supposed to store a single element.

What's the worst running time for a look up in a Hashtable?

In hashing, all the above operations can be performed in O(1) i.e. constant time. It is important to understand that the worst case time complexity for hashing remains O(n) but the average case time complexity is O(1).


1 Answers

Repeat the following steps until an element is returned:

  • Randomly select a bucket. Let k be the number of elements in the bucket.
  • Select p uniformly at random from 1 ... L. If p <= k then return the pth element in the bucket.

It should be clear that the procedure returns an element uniformly at random. We are sort of applying rejection sampling to the elements placed in a 2D array.

The expected number of elements per bucket is n / m. The probability that the sampling attempt succeeds is (n / m) / L. The expected number of attempts needed to find a bucket is therefore L * m / n. Together with the O(L) cost of retrieving the element from the bucket, the expected running time is O(L * (1 + m / n)).

like image 188
antonakos Avatar answered Oct 21 '22 12:10

antonakos