Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set with getRandom

This is an interview question: Impement a Set class with get, put, and getRandom.

I would consider the following options:

  • sorted/unsorted linked list: get - O(N), put - O(N), getRandom - O(N)
  • unsorted vector (resizable array): get - O(N), put - ?, getRandom - O(1)
  • sorted vector (resizable array): get - O(logN), put - ?, getRandom - O(1)
  • hash table: get - O(1), put - O(1), getRandom - O(table size)
  • balanced binary search tree: get - O(logN), put - O(logN), getRandom - O(N)

It looks like the best candidates are:

  • hash table if get/put are much more frequent than getRandom
  • sorted vector (resizable array) if getRandom is much more frequent than get/put

Now I wonder if we can combine a vector and hash table somehow to make up a better set.

like image 703
Michael Avatar asked Jul 02 '26 00:07

Michael


2 Answers

You can make get, put and getRandom all be O(1) average time.

What you do is store 2 data structures. One is a hash. The other lists the elements in random order in a growable array.

When you put, you put it in the hash, add the element to the end of the array, then swap the end of the array for a random array element.

When you get, you look in the hash for the element.

When you getRandom, you take the last element of the array, and then swap that last element with another spot in the array.

If you want you can add delete as just removing the hash. Now getRandom is implemented by taking the element, checking whether it is in the hash, and if it is not then shrinking the array, and repeating. At this point getRandom is occasionally O(n) BUT the amortized average costs of all operations can be proven to be O(1).

like image 92
btilly Avatar answered Jul 03 '26 17:07

btilly


If you keep a separate structure that tells you how many items there are in each bucket of a hash table, you can use binary search to find the n-th element, which would give you O(log n) for all three operations.

A balanced binary search tree augmented with a "count" for each node (that tells how many nodes has the subtree rooted at this node) would work as well for those bounds.

Some corrections to the above: you can't do random access in a linked list, so all operations are O(N). Also, put is O(n) in both vectors due to having to displace elements in the sorted version and checking for duplicates in the unsorted version.

like image 36
ffao Avatar answered Jul 03 '26 17:07

ffao



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!